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METHOD FOR SIMULTAN EOUS IDENTIFICATION 
OF DIFFERENTIALLY RXPRESSFD mRNAs A ND MFASUREMENT 

OF RELATIVE CO NCENTRATIONS 

5 

ft ACKGROI IND OF THE INVENTION 
This invention is directed to methods for simultaneous identification of 
differentially expressed mRNAs, as well as measurements of their relative 
concentrations. 

1 0 An ultimate goal of biochemical research ought to be a complete characterization 

of the protein molecules that make up an organism. This would include their 
identification, sequence determination, demonstration of their anatomical sites of 
expression, elucidation of their biochemical activities, and understanding of how these 
activities determine organismic physiology. For medical applications, the description 
1 5 should also include information about how the concentration of each protein changes in 
response to pharmaceutical or toxic agents. 

Let us consider the scope of the problem: How many genes are there? The issue 
of how many genes are expressed in a mammal is still unsettled after at least two decades 
of study. There are few direct studies that address patterns of gene expression in 
20 different tissues. Mutational load studies (J.O. Bishop, "The Gene Numbers Game," CeH 
2:81-86 (1974); T. Ohta & M. Kimura, "Functional Organization of Genetic Material as a 
Product of Molecular Evolution," Nature 223:1 18-1 19 (1971)) have suggested that there 
are between 3x1 0 4 and 10 5 essential genes. 

Before cDNA cloning techniques, information on gene expression came from 
25 RNA complexity studies: analog measurements (measurements in bulk) based on 
observations of mixed populations of RNA molecules with different specificities in 
abundances. To an unexpected extent, early analog complexity studies were distorted by 
hidden complications of the fact that the molecules in each tissue that make up most of 
its mRNA mass comprise only a small fraction of its total complexity. Later, cDNA 
30 cloning allowed digital measurements (i.e., sequence-specific measurements on 

individual species) to be made; hence, more recent concepts about mRNA expression are 
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based upon actual observations of individual RNA species. 

Brain, liver, and kidney are the mammalian tissues that have been most 
extensively studied by analog RNA complexity measurements. The lowest estimates of 
complexity are those of Hastie and Bishop (N.D. Hastie & J. B. Bishop, "The Expression 
of Three Abundance Classes of Messenger RNA in Mouse Tissues," Cell 9:761-774 
(1976)), who suggested that 26x1 0 6 nucleotides of the 3x1 0 9 base pair rodent genome 
were expressed in brain, 23x1 0 6 in liver, and 22x1 0 6 in kidney, with nearly complete 
overlap in RNA sets. This indicates a very minimal number of tissue-specific mRNAs. 
However, experience has shown that these values must clearly be underestimates, 
because many mRNA molecules, which were probably of abundances below the 
detection limits of this early study, have been shown to be expressed in brain but 
detectable in neither liver nor kidney. Many other researchers (J.A. Bantle & W.E. 
Hahn, "Complexity and Characterization of Polyadenylated RNA in the Mouse Brain," 
Cell 8:139-150 (1976); D.M. Chikaraishi, "Complexity of Cytoplasmic Polyadenylated 
and Non-Adenylated Rat Brain Ribonucleic Acids," Biochemistry 18:3249-3256 (1979)) 
have measured analog complexities of between 1 00-200x1 0 6 nucleotides in brain, and 2- 
to-3-fold lower estimates in liver and kidney. Of the brain mRNAs, 50-65% are detected 
in neither liver nor kidney. These values have been supported by digital cloning studies 
(R.J. Milner & J.G. Sutcliffe, "Gene Expression in Rat Brain," NucL Acids Res. 1 1 :5497- 
5520 (1983)). 

Analog measurements on bulk mRNA suggested that the average mRNA length 
was between 1400-1900 nucleotides. In a systematic digital analysis of brain mRNA 
length using 200 randomly selected brain cDNAs to measure RNA size by northern 
blotting (Milner & Sutcliffe, supra ), it was found that, when the mRNA size data were 
weighted for RNA prevalence, the average length was 1790 nucleotides, the same as that 
determined by analog measurements. However, the mRNAs that made up most of the 
brain mRNA complexity had an average length of 5000 nucleotides. Not only were the 
rarer brain RNAs longer, but they tended to be brain specific, while the more prevalent 
brain mRNAs were more ubiquitously expressed and were much shorter on average. 
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These concepts about mRNA lengths have been corroborated more recently from 
the length of brain mRNA whose sequences have been determined (J.G. Sutcliffe, 
"mRNA in the Mammalian Central Nervous System," Annu. Rev. Neurosci. 1 1 : 157-198 
(1988)). Thus, the 1-2x10* nucleotide complexity and 5000-nucleotide average mRNA 
length calculates to an estimated 30,000 mRNAs expressed in the brain, of which about 
2/3 are not detected in liver or kidney. Brain apparently accounts for a considerable 
portion of the tissue-specific genes of mammals. Most brain mRNAs are expressed at 
low concentration. There are no total-mammal mRNA complexity measurements, nor is 
it yet known whether 5000 nucleotides is a good mRN A-length estimate for non-neural 
tissues. A reasonable estimate of total gene number might be between 50,000 and 
100,000. 

What is most needed to advance by a chemical understanding of physiological 
function is a menu of protein sequences encoded by the genome plus the cell types in 
which each is expressed. At present, protein sequences can be reliably deduced only 
from cDNAs, not from genes, because of the presence of the intervening sequences 
(introns) in the genomic sequences. Even the complete nucleotide sequence of a 
mammalian genome will not substitute for characterization of its expressed sequences. 
Therefore, a systematic strategy for collecting transcribed sequences and demonstrating 
their sites of expression is needed. Such a strategy would be of particular use in 
determining sequences expressed differentially within the brain. It is necessarily an 
eventual goal of such a study to achieve closure; that is, to identify all mRNAs. Closure 
can be difficult to obtain due to the differing prevalence of various mRNAs and the large 
number of distinct mRNAs expressed by many distinct tissues. The effort to obtain it 
allows one to obtain a progressively more reliable description of the dimensions of gene 
space. 

Studies carried out in the laboratory of Craig Venter (M.D. Adams et al., 
"Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome 
Project," Science 252:1651-1656 (1991); M.D. Adams et al., "Sequence Identification of 
2,375 Human Brain Genes," Nature 355:632-634 (1992)) have resulted in the isolation of 
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randomly chosen cDNA clones of human brain mRNAs, the determination of short 
single-pass sequences of their 3 '-ends, about 300 base pairs, and a compilation of some 
2500 of these as a database of "expressed sequence tags." This database, while useful, 
$ fails to provide any knowledge of differential expression. It is therefore important to be 

5 able to recognize genes based on their overall pattern of expression within regions of 
I brain and other tissues and in response to various paradigms, such as various 

physiological or pathological states or the effects of drug treatment, rather than simply 
5 their expression in a single tissue. 

Other work has focused on the use of the polymerase chain reaction (PCR) to 
1 0 establish a database. Williams et al. (J.G.K. Williams et al., "DNA Polymorphisms 
Amplified by Arbitrary Primers Are Useful as Genetic Markers," Nucl, Acids R<?S, 
18:6531-6535 (1990)) and Welsh & McClelland (J. Welsh & McClelland, "Genomic 
Fingerprinting Using Arbitrarily Primed PCR and a Matrix of Pairwise Combinations of 
| Primers," N..cl Acids Res. 18:7213-7218 (1990)) showed that single 10-mer primers of 

1 5 arbitrarily chosen sequences, i.e., any 1 0-mer primer off the shelf, when used for PCR 
with complex DNA templates such as human, plant, yeast, or bacterial genomic DNA, 
gave rise to an array of PCR products. The priming events were demonstrated to involve 
incomplete complementarity between the primer and the template DNA. Presumably, 
partially mismatched primer-binding sites are randomly distributed through the genome. 
20 Occasionally, two of these sites in opposing orientation were located closely enough 
together to give rise to a PCR product band. There were on average 8-10 products, 
which varied in size from about 0.4 to about 4 kb and had different mobilities for each 
primer. The array of PCR products exhibited differences among individuals of the same 
8 species. These authors proposed that the single arbitrary primers could be used to 

25 produce restriction fragment length polymorphism (RFLP)-like information for genetic 
studies. Others have applied this technology (S.R. Woodward et al., "Random Sequence 
Oligonucleotide Primers Detect Polymorphic DNA Products Which Segregate in Inbred 
Strains of Mice," M*mm. Genome 3:73-78 (1992); J.H. Nadeau et al., "Multilocus 
Markers for Mouse Genome Analysis: PCR Amplification Based on Single Primers of 
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Arbitrary Nucleotide Sequence," Mamm. Genome 3:55-64 (1992)). 

Two groups (J. Welsh et al, "Arbitrarily Primed PCR Fingerprinting of RNA," 
Nucl. Acids Res. 20:4965-4970 (1992); P. Liang & A.B. Pardee, "Differential Display of 
Eukaryotic Messenger RNA by Means of the Polymerase Chain Reaction," Science 
5 257:967-97 1 ( 1 992)) adapted the method to compare mRNA populations. In the study of 
Liang and Pardee, this method, called mRNA differential display, was used to compare 
the population of mRNAs expressed by two related cell types, normal and tumorigenic 
mouse A3 1 cells. For each experiment, they used one arbitrary 1 0-mer as the S'-primer 
and an oligonucleotide complementary to a subset of poly A tails as a 3' anchor primer, 
1 0 performing PCR amplification in the presence of 35 S-dNTPs on cDN As prepared from 
the two cell types. The products were resolved on sequencing gels and 50-100 bands 
ranging from 100-500 nucleotides were observed. The bands presumably resulted from 
amplification of cDNAs corresponding to the 3 '-ends of mRNAs that contain the 
complement of the 3' anchor primer and a partially mismatched 5' primer site, as had 
1 5 been observed on genomic DNA templates. For each primer pair, the pattern of bands 
amplified from the two cDNAs was similar, with the intensities of about 80% of the 
bands being indistinguishable. Some of the bands were more intense in one or the other 
of the PCR samples; a few were detected in only one of the two samples. 

Further studies (P. Liang et al., "Distribution and Cloning of Eukaryotic mRNAs 
20 by Means of Differential Display: Refinements and Optimization," Nucl. Acids Res. 
21 :3269-3275 (1993)) have demonstrated that the procedure works with low 
concentrations of input RNA (although it is not quantitative for rarer species), and the 
specificity resides primarily in the last nucleotide of the 3 ' anchor primer. At least a 
third of identified differentially detected PCR products correspond to differentially 
25 expressed RN As, with a false positive rate of at least 25%. 

If all of the 50,000 to 100,000 mRNAs of the mammal were accessible to this 
arbitrary-primer PCR approach, then about 80-95 5' arbitrary primers and 12 3' anchor 
primers would be required in about 1000 PCR panels and gels to give a likelihood, 
calculated by the Poisson distribution, that about two-thiTds of these mRNAs would be 
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identified. 

It is unlikely that all mRNAs are amenable to detection by this method for the 
following reasons. For an mRNA to surface in such a survey, it must be prevalent 
enough to produce a signal on the autoradiograph and contain a sequence in its 3' 500 

5 nucleotides capable of serving as a site for mismatched primer binding and priming. The 
more prevalent an individual mRNA species, the more likely it would be to generate a 
product. Thus, prevalent species may give bands with many different arbitrary primers. 
Because this latter property would contain an unpredictable element of chance based on 
selection of the arbitrary primers, it would be difficult to approach closure by the 

10 arbitrary primer method. Also, for the information to be portable from one laboratory to 
another and reliable, the mismatched priming must be highly reproducible under 
different laboratory conditions using different PCR machines, with the resulting slight 
variation in reaction conditions. As the basis for mismatched priming is poorly 
understood, this is a drawback of building a database from data obtained by the Liang & 

15 Pardee differential display method. 

There is therefore a need for an improved method of differential display of 
mRNA species that reduces the uncertain aspect of 5 '-end generation and allows data to 
be absolutely reproducible in different settings. Preferably, such a method does not 
depend on potentially irreproducible mismatched priming. Preferably, such a method 

20 reduces the number of PCR panels and gels required for a complete survey and allows 
double-strand sequence data to be rapidly accumulated. Preferably, such an improved 
method also reduces, if not eliminates, the number of concurrent signals obtained from 
the same species of mRNA. 

25 SUMMAR Y 

We have developed an improved method for the simultaneous sequence-specific 
identification of mRNAs in a mRNA population. In general, this method comprises: 

(1) preparing double-stranded cDNAs from a mRNA population using a 
mixture of 12 anchor primers, the anchor primers each including: (i) a tract of from 7 to 
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40 T residues; (ii) a site for cleavage by a restriction endonuclease that recognizes more 

than six bases, the site for cleavage being located to the 5 '-side of the tract of T residues; 

(iii) a first stuffer segment of from 4 to 40 nucleotides, the first stuffer segment being 
\ located to the 5 '-side of the site for cleavage by the restriction endonuclease; and (iv) 

| 5 phasing residues -V-N located at the 3 ' end of each of the anchor primers, wherein V is a 

I deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 

deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture 
% including anchor primers containing all possibilities for V and N; 

(2) producing cloned inserts from a suitable host cell that has been 

10 transformed by a vector, the vector having the cDNA sample that has been cleaved with a 
first restriction endonuclease and a second restriction endonuclease inserted therein, the 
cleaved cDNA sample being inserted in the vector in an orientation that is antisense with 
respect to a bacteriophage-specific promoter within the vector, the first restriction 
* endonuclease recognizing a four-nucleotide sequence and the second restriction 

1 5 endonuclease cleaving at a single site within each member of the mixture of anchor 
primers; 

(3) generating linearized fragments of the cloned inserts by digestion with 
at least one restriction endonuclease that is different from the first and second restriction 
endonucleases; 

20 (4) generating a cRNA preparation of antisense cRNA transcripts by 

incubation of the linearized fragments with a bacteriophage-specific RNA polymerase 
capable of initiating transcription from the bacteriophage-specific promoter; 

(5) dividing the cRNA preparation into sixteen subpools and transcribing 
first-strand cDN A from each subpool, using a thermostable reverse transcriptase and one 
25 of sixteen 5'-RT primers whose 3'-terminus is -N-N, wherein N is one of the four 
1 deoxyribonucleotides A, C, G, or T, the 5'-RT primer being at least 1 5 nucleotides in 

length, corresponding in sequence to the 3 '-end of the bacteriophage-specific promoter, 
and extending across into at least the first two nucleotides of the cRNA, the mixture 
including all possibilities for the 3 '-terminal two nucleotides; 
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(6) using the product of transcription in each of the sixteen subpools as a 
template for a polymerase chain reaction with a 3'-PCR primer that corresponds in 
sequence to a sequence in the vector adjoining the site of insertion of the cDNA sample 
in the vector and a 5'-PCR primer selected from the group consisting of: (i) the 5'-RT 

5 primer from which first-strand cDNA was made for that subpool; (ii) the 5'-RT primer 
from which the first-strand cDNA was made for that subpool extended at its 3 '-terminus 
by an additional residue -N, where N can be any of A, C, G, or T; and (iii) the 5'-RT 
primer used for the synthesis of first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can be any of A, C, G, or T, to 

1 0 produce polymerase chain reaction amplified fragments; and 

(7) resolving the polymerase chain reaction amplified fragments by 
electrophoresis to display bands representing the 3 ' -ends of mRNAs present in the 
sample. 

In another preferred embodiment, the method comprises the steps of: 
1 5 ( a ) preparing a double-stranded cDN A population from an mRN A 

population using a mixture of anchor primers, the anchor primers each including: (i) a 
tract of from 7 to 40 T residues; (ii) a site for cleavage by a first restriction endonuclease 
that recognizes more than six bases, the site for cleavage being located to the 5'-side of 
the tract of T residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the first 
20 stuffer segment being located to the 5'-side of the site for cleavage by the first restriction 
endonuclease; and (iv) phasing residues located at the 3* end of each of the anchor 
primers selected from the group consisting of -V, and -V-N , wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture 
25 including anchor primers containing all possibilities for V and N where the phasing 
residues in the mixture are defined by one of -V, or -V-N; 

(b) cleaving the double-stranded cDNA population with the first 
restriction endonuclease and with a second restriction endonuclease, the second 
restriction endonuclease recognizing a four-nucleotide sequence, to form a population of 
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double-stranded cDNA molecules having first and second termini, respectively; 

(c) inserting the double-stranded cDNA molecules from step (b) each 
into a vector in an orientation that is antisense with respect to a bacteriophage-specific 
promoter within the vector to form a population of vectors containing the inserted cDNA 
molecules, said inserting defining 3' and 5' flanking vector sequences such that 5' is 
upstream from the sense strand of the inserted cDNA and 3' is downstream of the sense 
strand, and said vector having a 3' flanking nucleotide sequence of from at least 15 
nucleotides in length between said first restriction endonuclease site and a site defining 
transcription initiation in said promoter; 

(d) generating linearized fragments containing the inserted cDNA 
molecules by digestion of the vectors produced in step (c) with at least one restriction 
endonuclease that does not recognize sequences in the inserted cDNA molecules or in the 
bacteriophage-specific promoter, but does recognize sequences in the vector such that the 
resulting linearized fragments have a 5' flanking vector sequence of at least 15 
nucleotides 5' to the site of insertion of the cDNA sample into the vector at the cDNA's 
second terminus; 

(e) generating a cRNA preparation of antisense cRNA transcripts by 
incubation of the linearized fragments with a bacteriophage-specific RNA polymerase 
capable of initiating transcription from the bacteriophage-specific promoter; 

(f) dividing the cRNA preparation into subpools and transcribing first- 
strand cDNA from each subpool, using a reverse transcriptase and one of the 5'-RT 
primers defined as having a 3'-terminus consisting of-N x , wherein "N" is one of the four 
deoxyribonucleotides A, C, G, or T, and "x" is an integer from 1 to 5, the 5'-RT primer 
being 15 to 30 nucleotides in length and complementary to the 5' flanking vector 
sequence with the 5'-RT primer's complementarity extending across into the insert- 
specific nucleotides of the cRNA in a number of nucleotides equal to "x", wherein a 
different one of said 5'-RT primers is used in different subpools and wherein there are 4 
subpools if "x" = 1,16 subpools if Y = 2, 64 subpools if V = 3, 256 subpools if V = 
4, and 1,024 subpools if M x" = 5; 
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(g) using the product of first-strand cDNA transcription in each of the 
subpools as a template for a polymerase chain reaction with a 3'-PCR primer of 15 to 30 
nucleotides in length that is complementary to 3' flanking vector sequences between said 
first restriction endonuclease site and the site defining transcription initiation by the 
5 bacteriophage-specific promoter and a 5-PCR primer having a 3'-terminus consisting of 
-N x -N y , where "N" and n x" are as in step (f), -N x is the same sequence as in the 5'-RT 
primer from which first-strand cDNA was made for that subpool, and n y" is a whole 
g integer such that x + y equals an integer selected from the group consisting of 3, 4, 5 and 

6, the primer being 1 5 to 30 nucleotides in length and complementary to the 5' flanking 
1 0 vector sequence with the 5'-PCR primer's complementarity extending across into the 
insert-specific nucleotides of the cRNA in a number of nucleotides equal to "x + y", to 
produce polymerase chain reaction amplified fragments; and 
| (h) resolving the polymerase chain reaction amplified fragments to 

t generate a display of sequence-specific products representing the 3'-ends of different 

1 5 mRN As present in the mRN A population. 

Typically, the anchor primers each have 18 T residues in the tract of T residues, 
and the first stuffer segment of the anchor primers is 14 residues in length. A suitable 
sequence for the first stuffer segment is A-A-C-T-G-G-A-A-G-A-A-T-T-C (SEQ ID NO: 
1). Typically, the site for cleavage by a first restriction endonuclease that recognizes 
20 more than six bases is the NotI cleavage site. Suitable anchor primers can also comprise 
a second stuffer segment interposed between the site for cleavage by a first restriction 
endonuclease that recognizes more than six bases and the tract of T residues. Phasing 
residues that are at the 3 5 end of the anchor primer and 3' to the tract of T residues are 
chosen from the group consisting of -V and -V-N, where V is a deoxyribonucleotide 
25 selected from the group consisting of A, C, and G; and N is a deoxyribonucleotide 

i 

^ selected from the group consisting of A, C, G, and T. 

In one preferred embodiment, the anchor primer has the sequence A-A-C-T-G-G- 
A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T- 
T-T-T-V-N (SEQ ID NO: 2), including a first stuffer segment of A-A-C-T-G-G-A-A-G- 
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A-A-T-T-C that is 5' to the NotI site G-C-G-G-C-C-G-C, a second stuffer sequence A- 
G-G-A-A interposed between the restriction endonuclease cleavage site and the tract of T 
residues, and phasing residues -V-N. In other preferred embodiments, the phasing 
T s residues of the anchor primer used in step (a) are -V. 

I 5 Typically the first restriction endonuclease that recognizes more than six bases is 

I selected from the group consisting of AscI, Bad, Esel, NotI, Pad, £mel, PcuMI, RsrII, 

Sapl. Se^AI, £fil s Sfifl, SgrAI, Srfl, £se8387I and Swal. Typically the second 

S restriction endonuclease recognizing a four-nucleotide sequence is selected from the 

->> 

group consisting ofMboI. Dpn II. Sau3AI. Tsp509I, HpaN, Bfal, Csp6I, Msd, Hhal, 

10 Nlalll, TaqL MspL Maell and HinPlI. 

Typically the value of "x" in step (f) is 1 or 2. Typically the value of "y" in step 
(g) is 3 or 4. In one embodiment, the phasing residues in step (a) are -V-N, the "x" in 
step (f) is 2, and the n y" in step (g) is 2. In another embodiment, the phasing residues in 
5 step (a) are -V-N, the "x" in step (f) is 1 , and the "y" in step (g) is 3. la another 

15 embodiment, the phasing residues in step (a) are -V-N, the V in step (f) is 1 , and the "y" 
in step (g) is 4. 

Another embodiment, the phasing residues in step (a) are -V, the "x" in step (f) is 
1, and the "y" in step (g) is 3. In a further embodiment, the phasing residues in step (a) 
are -V, the "x" in step (f) is 1, and the "y" in step (g) is 4. 
20 Typically, the anchor primers each have 1 8 T residues in the tract of T residues, 

and the first stuffer segment of the anchor primers is 14 residues in length. 

Suitable vectors are pBC SK + and pBS SK + (Stratagene). In another aspect, the 
invention provides improved vectors based on pBS SK + that are designed for the practice 
* of the invention such as pBS SK7DGT1, pBS SK7DGT2 and pBS SK7DGT3, 

25 described in detail below. Such improved vectors can also be based on pBC SK + or other 
^ suitable vectors well known to one skilled in the art. 
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Preferred vectors are improved vectors based on the plasmid vector pBluescript 
(pBS or pBC) SK+ (Stratagene) in which a portion of the nucleotide sequence from 
positions 656 to 764 was removed and replaced with a sequence of at least 1 10 
nucleotides including a NotI restriction endonuclease site. This region, designated the 
5 multiple cloning site (MCS), spans the portion of the nucleotide sequence from the SacI 

site to the Kpn l site. 

The vector can be the plasmid pBC SIC cleaved with Oal and NotI, in which 
case the 3'-PCR primer in step (6) can be G-A-A-C-A-A-A-A-G-C-T-G-G-A-G-C-T-C- 
C-A-C-C-G-C (SEQIDNO:4). In preferred embodiment, the vector is chosen from 

10 the group consisting of pBC SIC, pBS SIC and pBS SIC /DGT1 and the 3'-PCR primer 
in step (f> is G-A-G-C-T-C-C-A-C-C-G-C-G-G-T (SEQ ID NO: 18). 

Typically the restriction endonuclease used in step (d) has a nucleotide sequence 
recognition that includes the four-nucleotide sequence of the second restriction 
endonuclease used in step (b). In general, the sites for such restriction endonucleases 

15 must be in the vector sequence 5' to the Qal site as well as in the MCS between the £]al 

site and the NotI site. 

In one embodiment, vector is the plasmid pBC SK + and Mspl is used both as the 
second restriction endonuclease and as the linearization restriction endonuclease used in 
step (d). 

20 In another embodiment, vector is the plasmid pBC SIC, the second restriction 

endonuclease is chosen from the group consisting of Msgl, Maell, I^al and HinPlI and 
the linearization in step (d) is accomplished by a first digestion with Sjmal followed by a 
second digestion with a mixture of Kpn l and Apa l. 

In other embodiments the vector is chosen from the group consisting of pBS SIC 

25 /DGT1 , pBS SIC /DGT2 and pBS SIC /DGT3. In such embodiments, one suitable 

enzyme combination is provided where the second restriction endonuclease is Mspl and 
the restriction endonuclease used in step (d) is Sjma I. Another suitable combination is 
provided where the second restriction endonuclease is TagI and the restriction 
endonuclease used in step (d) is Xho l. A further suitable combination is provided where 
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the second restriction endonuclease is HinPlI and the restriction endonuclease used in 
step (d) is Nar l. Yet another suitable combination is provided where the second 

* restriction endonuclease is Maell and the restriction endonuclease used in step (d) is 

Aatll. 

t 

1 5 Typically the bacteriophage-specific promoter is selected from the group 

I 

consisting of T3 promoter, T7 promoter and SP6 promoter. Most typically it is the T3 
promoter. 

Typically, the sixteen 5'-RT primers for priming of transcription of cDNA from 
cRNA have the sequence A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3). 
1 0 In another preferred embodiment, the four 5 '-RT primers for priming of transcription of 
cDNA from cRNA have the sequence G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N (SEQ ID 
NO: 9). 

| The second restriction endonuclease recognizing a four-nucleotide sequence is 

* typically Msp l: alternatively, it can be Tggl, Maell or HinPlI. The restriction 

1 5 endonuclease cleaving at a single site in each of the mixture of anchor primers is 
typically Notl . 

Typically, the mRNA population has been enriched for polyadenylated mRNA 
species. 

A typical host cell is a strain of Escherichia coli . 
20 The step of generating linearized fragments of the cloned inserts typically 

comprises: 

(a) dividing the plasmid containing the insert into two fractions, a first 
fraction cleaved with the restriction endonuclease Xhol and a second fraction cleaved 
with the restriction endonuclease Sail; 
25 (b) recombining the first and second fractions after cleavage; 

: (c) dividing the recombined fractions into thirds and cleaving the first 

third with the restriction endonuclease HindllL the second third with the restriction 
endonuclease BamHI, and the third third with the restriction endonuclease EcoRI; and 

(d) recombining the thirds after digestion in order to produce a population 

4 
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of linearized fragments of which about one-sixth of the population corresponds to the 
product of cleavage by each of the possible combinations of enzymes. 

In another embodiment, wherein the vector is chosen from the group consisting of 
% pBC SK + and pBS SK + , and Mspl is the second restriction endonuclease, Msp l can be 

^ 5 used as the linearization restriction endonuclease used in step (d). Alternatively, where 

| the vector is the plasmid pBC SK\ linearization can be accomplished by a first digestion 

r 

with Sma l followed by a second digestion with a mixture of Kpn l and Apa l. 
. M In other embodiments, the vector is chosen from the group consisting of pBS SK + 

■ 

/DGT1, pBS SK + /DGT2 and pBS SK + /DGT3 and the linearization restriction 
10 endonuclease used in step (d) is chosen from the group consisting of Smal, Xho L Nar l 
and AatH. 

Typically, the step of resolving the polymerase chain reaction amplified 
fragments by electrophoresis comprises electrophoresis of the fragments on at least two 
/ gels. 

15 Each sequence-specific PCR product, or polymerase chain reaction amplified 

fragment, is identified by a digital address consisting of a sequence identifier, the length 
of the product in nucleotide residues and the intensity of labeling of the PCR product, 
defined as the area under the peak of the detector output for that PCR product. 

The sequence identifier is defined by a 5' component and a 3' component. The 5' 
20 component of the sequence identifier is the recognition site of the second restriction 

nuclease used to cleave the double stranded cDNA population prepared from the original 
mRNA population. Typically, the restriction endonuclease is Msp L and the 5' 
component of the sequence identifier is -C-C-G-G. The 3' component of the sequence 
* identifier is the sequence defined by the 3' terminus sequence of the 5'PCR primer. For 

25 example, the 3' component of the sequence identifier of the PCR product indicated as 
I "111" in Fig. 2 is -C-T-G-C. Therefore, in this case, the sequence identifier would be 

-C-C-G-G-C-T-G-C. 

Typically, a database comprising the digital address, as defined above as 
sequence identifier and the length of the sequence-specific PCR product in nucleotide 

- 14- 



BNSDOCID: <WO 0000646A1JA> 



WO 00/00646 PCT/US99/14940 



residues and the intensity of labeling of the PCR product, defined as the area under the 
peak of the detector output for that PCR product, is constructed and maintained using 
suitable computer hardware and computer software. Preferably, such a database further 
? comprises data concerning sequence relationships, gene mapping, cellular distributions, 

5 experimental treatment conditions and any other information considered relevant to gene 

| function. 

The method can further comprise determining the sequence of the 3 '-end of at 

;jj least one of the mRNAs, such as by: 

(1) eluting at least one cDNA corresponding to a mRNA from an 

*' 1 0 electropherogram in which bands representing the 3 '-ends of mRNAs present in the 

sample are displayed; 

(2) amplifying the eluted cDNA in a polymerase chain reaction; 

(3) cloning the amplified cDNA into a plasmid; 

^ (4) producing DNA corresponding to the cloned DNA from the plasmid; 

15 and 

(5) sequencing the cloned cDNA. 
Another aspect of the invention is a method of simultaneous sequence-specific 
identification of mRNAs corresponding to members of an antisense cRNA pool 
representing the 3 '-ends of a population of mRNAs, the antisense cRNAs that are 
20 members of the antisense cRNA pool being terminated at their 5 '-end with a primer 
sequence corresponding to a bacteriophage-specific vector and at their 3 '-end with a 
sequence corresponding in sequence to a sequence of the vector. 
The method comprises; 

* (1) dividing the members of the antisense cRNA pool into sixteen 

25 subpools and transcribing first-strand cDNA from each subpool, using a thermostable 

1 reverse transcriptase and one of sixteen 5'-RT primers whose 3 '-terminus is -N-N, 

wherein N is one of the four deoxyribonucleotides A, C, G, or T, the 5'-RT primer being 
at least 15 nucleotides in length, corresponding in sequence to the 3 '-end of the 
bacteriophage-specific promoter, and extending across into at least the first two 



■a 
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nucleotides of the cRNA, the mixture including all possibilities for the 3 '-terminal two 
nucleotides; 

(2) using the product of transcription in each of the sixteen subpools as a 
template for a polymerase chain reaction with a 3'-PCR primer that corresponds in 
sequence to a sequence vector adjoining the site of insertion of the cDNA sample in the 
vector and a 5'-PCR primer selected from the group consisting of: (i) the 5'-RT primer 
from which first-strand cDNA was made for that subpool; (ii) the 5'-RT primer from 
which the first-strand cDNA was made for that subpool extended at its 3 '-terminus by an 
additional residue -N, where N can be any of A, C, G, or T; and (iii) the 5'-RT primer 
used for the synthesis of first-strand cDNA for that subpool extended at its 3 '-terminus 
by two additional residues -N-N, wherein N can be any of A, C, G, or T, to produce 
polymerase chain reaction amplified fragments; and 

(3) resolving the polymerase chain reaction amplified fragments by 
electrophoresis to display bands representing the 3 '-ends of mRNAs present in the 
sample. 

In another preferred embodiment, the method comprises: 

( 1 ) dividing the cRNA preparation into subpools and transcribing first- 
strand cDNA from each subpool, using a reverse transcriptase and one of the 5'-RT 
primers defined as having a 3'-terminus consisting of-N x , wherein M N" is one of the four 
deoxyribonucleotides A, C, G, or T, and V is an integer from 1 to 5, the 5'-RT primer 
being 15 to 30 nucleotides in length and complementary to the 5' flanking vector 
sequence with the 5'-RT primer's complementarity extending across into the insert- 
specific nucleotides of the cRNA in a number of nucleotides equal to "x", wherein a 
different one of said 5'-RT primers is used in different subpools and wherein there are 4 
subpools if "x" = 1,16 subpools if "x M = 2, 64 subpools if "x'» = 3, 256 subpools if "x" = 

4, and 1,024 subpools if "x" = 5; 

(2) using the product of first-strand cDN A transcription in each of the 
subpools as a template for a polymerase chain reaction with a 3'-PCR primer of 15 to 30 
nucleotides in length that is complementary to 3' flanking vector sequences between said 
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first restriction endonuclease site and the site defining transcription initiation by the 
bacteriophage-specific promoter and a 5'- PCR primer having a 3'-terminus consisting of 
-N -N where "N" and "x" are as in step (f), -N x is the same sequence as in the 5'-RT 
primer from which first-strand cDNA was made for that subpool, and "y" is a whole 

5 integer such that x + y equals an integer selected from the group consisting of 3, 4, 5 and 
6, the 5'-PCR primer being 15 to 30 nucleotides in length and complementary to the 5' 
flanking vector sequence with the 5' -PCR primer's complementarity extending across 
into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to M x + 
y", to produce polymerase chain reaction amplified fragments; and 

1 0 (3) resolving the polymerase chain reaction amplified fragments to 

generate a display of sequence-specific products representing the 3'-ends of different 
mRNAs present in the mRNA population. 

Yet another aspect of the present invention is a method for detecting a change in 
the pattern of mRNA expression in a tissue associated with a physiological or 

15 pathological change. This method comprises the steps of: 

(1) obtaining a first sample of a tissue that is not subject to the 

physiological or pathological change; 

(2) determining the pattern of mRNA expression in the first sample of the 
tissue by performing steps (l)-(3) of the method described above for simultaneous 

20 sequence-specific identification of mRNAs corresponding to members of an antisense 

cRNA pool representing the 3 '-ends of a population of mRNAs to generate a first display 
of bands representing the 3 '-ends of mRNAs present in the first sample; 

(3) obtaining a second sample of the tissue that has been subject to the 
physiological or pathological change; 

25 (4) determining the pattern of mRNA expression in the second sample of 

the tissue by performing steps (l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding to members of an antisense 
cRNA pool to generate a second display of bands representing the 3 '-ends of mRNAs 
present in the second sample; and 
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(5) comparing the first and second displays to determine the effect of the 
physiological or pathological change on the pattern of mRNA expression in the tissue. 
The comparison is typically made in adjacent lanes. 

The tissue can be derived from the central nervous system or from particular 
structures within the central nervous system. The tissue can alternatively be derived 
from another organ or organ system. 

Another aspect of the present invention is a method of screening for a side effect 
of a drug. The method can comprise the steps of: 

(1) obtaining a first sample of tissue from an organism treated with a 
compound of known physiological function; 

(2) determining the pattern of mRNA expression in the first sample of the 
tissue by performing steps (l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding to members of an antisense 
cRNA pool to generate a first display of bands representing the 3 '-ends of mRNAs 
present in the first sample; 

(3) obtaining a second sample of tissue from an organism treated with a 
drug to be screened for a side effect; 

(4) determining the pattern of mRNA expression in the second sample of 
the tissue by performing steps (l)-(3) of the method described above for simultaneous 
sequence- specific identification of mRNAs corresponding to members of an antisense 
cRNA pool to generate a second display of bands representing the 3 '-ends of mRNAs 
present in the second sample; and 

(5) comparing the first and second displays in order to detect the presence 
of mRNA species whose expression is not affected by the known compound but is 
affected by the drug to be screened, thereby indicating a difference in action of the drug 
to be screened and the known compound and thus a side effect. 

The drug to be screened can be a drug affecting the central nervous system, such 
as an antidepressant, a neuroleptic, a tranquilizer, an anticonvulsant, a monoamine 
oxidase inhibitor, or a stimulant. Alternatively, the drug can be another class of drug 
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such as an anti-parkinsonism agent, a skeletal muscle relaxant, an analgesic, a local 
anesthetic, a cholinergic, an antispasmodic, a steroid, or a non-steroidal anti- 
inflammatory drug. 

Another aspect of the present invention is panels of primers and degenerate 
5 mixtures of primers suitable for the practice of the present invention. These include: 

(1) a panel of S'-RT primers comprising 16 primers of the sequence A-G- 
G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3), wherein N is one of the four 

deoxyribonucleotides A, C, G, or T; 

(2) a panel of 5'-RT primers comprising 64 primers of the sequences A- 
10 G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 5), wherein N is one of the 

four deoxyribonucleotides A, C, G, or T; 

(3) a panel of 5'-PCR primers comprising 256 primers of the sequences 
A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6), wherein N is one of 

the four deoxyribonucleotides A, C, G, or T; and 
15 (4) a panel of S'-PCR primers comprising 1 024 primers of the sequences 

A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 24), wherein N is 
one of the four deoxyribonucleotides A, C, G, or T; and 

(5) a panel of 5'-PCR primers comprising 4096 primers of the sequences 
A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N (SEQ ID NO: 25), wherein N is 

20 one of the four deoxyribonucleotides A, C, G, or T; and 

(6) a panel of anchor primers comprising 12 primers of the sequences A- 

A-C-T-G-G-A-A^^ 

T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a deoxyribonucleotide selected 
from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from 

25 the group consisting of A, C, G, and T; 

(7) a panel of anchor primers comprising 3 primers of the sequences A- 

A-C-T-G-G-A-A-G-A-A-T-T-C-^ 

T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23), wherein V is a deoxyribonucleotide selected 
from the group consisting of A, C, and G; 
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(8) a panel of 5'-RT primers comprising 4 different oligonucleotides 
each having the sequence G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N (SEQ ID NO: 9), 
wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(9) a panel of 5'-RT primers comprising 16 different oligonucleotides 
5 each having the sequence G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 7), 

wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(10) a panel of 5'-PCR primers comprising 64 different oligonucleotides 
each having the sequence T-C-G-A-C-G-G-T-A-T-C-G-G-N^N-N (SEQ ID NO: 13), 
wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

[Q (1 1) a panel of 5'-PCR primers comprising 256 different oligonucleotides 

each having the sequence C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 14), 
wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(12) a panel of 5'PCR primers comprising 1024 different oligonucleotides 
each having the sequence G-A-C-G-G-T-A-T-C<G-G-N-N-N-N-N (SEQ ID NO: 15), 

15 wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(13) a panel of 5'-PCR primers comprising 4096 different 
oligonucleotides each having the sequence A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N 
(SEQ ID NO: 16), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(14) a panel of 5'-RT primers comprising 4 different oligonucleotides 
20 each having the sequence C-T-T-C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N (SEQ ID 

NO: 10), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(1 5) a panel of 5'-RT primers comprising 1 6 different oligonucleotides 
each having the sequence T-T-C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N (SEQ ID 
NO: 11), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

25 ( 1 6) a panel of 5 ' -PCR primers comprising 64 different oligonucleotides 

each having the sequence T-C-A-G-T-C-A-G-G-C-T-A- A-T-C-G-G-N-N-N (SEQ ID 
NO: 12), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(17) a panel of 5'-PCR primers comprising 256 different oligonucleotides 
each having the sequence C-A-G-T-C-A-G-G-C-T-A.A-T.C-G-G-N-N^N-N (SEQ ID 
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NO: 17), wherein N is one of the four deoxyribonucleotides A, C, G 5 or T; 

(18) a panel of 5'-PCR primers comprising 1024 different 
oligonucleotides each having the sequence A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N- 
N-N (SEQ ID NO: 26), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 
5 ( 1 9) a panel of 5 5 -PCR primers comprising 4096 different 

oligonucleotides each having the sequence G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N-N- 
N-N (SEQ ID NO: 27), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(20) a degenerate mixture of anchor primers comprising a mixture of 3 
primers of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G- 

10 A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23), wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G, each of the 3 
primers being present in about an equimolar quantity; and 

(21 ) a degenerate mixture of anchor primers comprising a mixture of 12 
primers of the sequences A-A-C-T-G-G.A-A-G-A-A-T-T-C-G-C-G-G.C-C-G-C-A-G-G- 

1 5 A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, each of the 12 
primers being present in about an equimolar quantity. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
These and other features, aspects, and advantages of the present invention will 

become better understood with reference to the following description, appended claims, 

and accompanying drawings where: 

Figure 1 is a diagrammatic depiction of the method of the present invention 

showing the various stages of priming, cleavage, cloning and amplification; 

Figure 2 is an autoradiogram of a gel showing the result of performing the 

method of the present invention using several 5 '-primers in the PCR step corresponding 

to known sequences of brain mRNAs and using liver and brain mRNA as starting 

material; and 

Figure 3 shows the nucleotide sequence of the multiple cloning sites of plasmids 
pBS SIC /DGT1, pBS SK + /DGT2, and pBS SK + /DGT3. 

DESCRIPTION 

We have developed a method for simultaneous sequence-specific identification 
and display of mRNAs in a mRNA population. 

As discussed below, this method has a number of applications in drug screening, 
the study of physiological and pathological conditions, and genomic mapping. These 
applications will be discussed below. 

SIMIJLTANEOUS SEQUENCE-SPECIFIC IDENTIFICATION OF mRNAs 
A method according to the present invention, based on the polymerase chain 
reaction (PCR) technique, provides means for visualization of nearly every mRNA 
expressed by a tissue as a distinct band on a gel whose intensity corresponds roughly to 
the concentration of the mRNA. The method is based on the observation that virtually 
all mRNAs conclude with a 3 '-poly (A) tail but does not rely on the specificity of primer 
binding to the tail. 

In general, the method comprises: 

(1) preparing double-stranded cDNAs from a mRNA population using a 
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mixture of 12 anchor primers, the anchor primers each including: (i) a tract of from 7 to 
40 T residues; (ii) a site for cleavage by a first restriction endonuclease that recognizes 
more than six bases, the site for cleavage being located to the 5 '-side of the tract of T 
residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the first stuffer segment 

5 being located to the 5'-side of the site for cleavage by the restriction endonuclease; and 
(iv) phasing residues -V-N located at the 3' end of each of the anchor primers, wherein V 
is a deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for V and N; 

1 0 (2) producing cloned inserts from a suitable host cell that has been 

transformed by a vector, the vector having the cDNA sample that has been cleaved with a 
first restriction endonuclease and a second restriction endonuclease inserted therein, the 
cleaved cDNA sample being inserted in the vector in an orientation that is antisense with 
respect to a bacteriophage-specific promoter within the vector, the second restriction 

1 5 endonuclease recognizing a four-nucleotide sequence and the first restriction 

endonuclease cleaving at a single site within each member of the mixture of anchor 
primers; 

(3) generating linearized fragments of the cloned inserts by digestion with 
at least one restriction endonuclease that is different from the first and second restriction 

20 endonucleases; 

(4) generating a cRNA preparation of antisense cRNA transcripts by 
incubation of the linearized fragments with a bacteriophage-specific RNA polymerase 
capable of initiating transcription from the bacteriophage-specific promoter; 

(5) dividing the cRNA preparation into sixteen subpools and transcribing 
25 first-strand cDNA from each subpool, using a thermostable reverse transcriptase and one 

of sixteen 5'-RT primers whose 3 '-terminus is -N-N, wherein N is one of the four 
deoxyribonucleotides A, C, G, or T, the 5'-RT primer being at least 15 nucleotides in 
length, corresponding in sequence to the 3 '-end of the bacteriophage-specific promoter, 
and extending across into at least the first two nucleotides of the cRNA, the mixture 
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including all possibilities for the 3 '-terminal two nucleotides; 

(6) using the product of transcription in each of the sixteen subpools as a 
template for a polymerase chain reaction with a 3'-PCR primer that corresponds in 
sequence to a sequence in the vector adjoining the site of insertion of the cDNA sample 
in the vector and a 5'-PCR primer selected from the group consisting of: (i) the 5'-RT 
primer from which first-strand cDNA was made for that subpool; (ii) the 5'-RT primer 
from which the first-strand cDNA was made for that subpool extended at its 3 '-terminus 
by an additional residue -N, where N can be any of A, C, G, or T; and (iii) the 5'-RT 
primer used for the synthesis of first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can be any of A, C, G, or T, to 
produce polymerase chain reaction amplified fragments; and 

(7) resolving the polymerase chain reaction amplified fragments by 
electrophoresis to display bands representing the 3 '-ends of mRNAs present in the 
sample. A depiction of this scheme is shown in Figure 1 . 

In another embodiment, the method comprises the steps of: 

(a) preparing a double-stranded cDNA population from an mRNA 
population using a mixture of anchor primers, the anchor primers each including: (i) a 
tract of from 7 to 40 T residues; (ii) a site for cleavage by a first restriction endonuclease 
that recognizes more than six bases, the site for cleavage being located to the 5'-side of 
the tract of T residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the first 
stuffer segment being located to the 5'-side of the site for cleavage by the first restriction 
endonuclease; and (iv) phasing residues located at the 3' end of each of the anchor 
primers selected from the group consisting of -V and -V-N , wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for V and N where the phasing 
residues in the mixture are defined by one of -V, or -V-N; 

(b) cleaving the double-stranded cDNA population with the first 
restriction endonuclease and with a second restriction endonuclease, the second 
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restriction endonuclease recognizing a four-nucleotide sequence, to form a population of 
double-stranded cDNA molecules having first and second termini, respectively; 

(c) inserting the double-stranded cDNA molecules from step (b) each 
into a vector in an orientation that is antisense with respect to a bacteriophage-specific 

5 promoter within the vector to form a population of vectors containing the inserted cDNA 
molecules, said inserting defining 3' and 5 l flanking vector sequences such that 5' is 
upstream from the sense strand of the inserted cDNA and 3' is downstream of the sense 
£ strand, and said vector having a 3' flanking nucleotide sequence of from at least 15 

nucleotides in length between said first restriction endonuclease site and a site defining 
10 transcription initiation in said promoter; 

(d) generating linearized fragments containing the inserted cDNA 
molecules by digestion of the vectors produced in step (c) with at least one restriction 
endonuclease that does not recognize sequences in the inserted cDNA molecules or in the 
bacteriophage-specific promoter, but does recognize sequences in the vector such that the 

15 resulting linearized fragments have a 5' flanking vector sequence of at least 15 

nucleotides 5' to the site of insertion of the cDNA sample into the vector at the cDNA's 
second terminus; 

(e) generating a cRNA preparation of antisense cRNA transcripts by 
incubation of the linearized fragments with a bacteriophage-specific RNA polymerase 

20 capable of initiating transcription from the bacteriophage-specific promoter; 

(f) dividing the cRN A preparation into subpools and transcribing first- 
strand cDNA from each subpool, using a reverse transcriptase and one of the 5'-RT 
primers defined as having a 3 '-terminus consisting of -N x , wherein "N" is one of the four 
deoxyribonucleotides A, C, G, or T, and "x" is an integer from 1 to 5, the 5'-RT primer 

25 being 15 to 30 nucleotides in length and complementary to the 5' flanking vector 

sequence with the 5'-RT primer's complementarity extending across into the insert- 
specific nucleotides of the cRNA in a number of nucleotides equal to M x", wherein a 
different one of said 5'-RT primers is used in different subpools and wherein there are 4 
• subpools if Y = 1,16 subpools if Y = 2, 64 subpools if "x" = 3, 256 subpools if "x" = 
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4, and 1,024 subpools if Y = 5; 

(g) using the product of first-strand cDNA transcription in each of the 
subpools as a template for a polymerase chain reaction with a 3'-PCR primer of 1 5 to 30 
nucleotides in length that is complementary to 3' flanking vector sequences between said 

5 first restriction endonuclease site and the site defining transcription initiation by the 

bacteriophage-specific promoter and a 5'-PCR primer having a 3'-terminus consisting of 
-N x -N y , where "N" and "x M are as in step (f), -N x is the same sequence as in the 5*-RT 
primer from which first-strand cDNA was made for that subpool, and "y" is a whole 
integer such that x + y equals an integer selected from the group consisting of 3, 4, 5 and 

0 6, the 5-PCR primer being 15 to 30 nucleotides in length and complementary to the 5' 
flanking vector sequence with the 5'-PCR primer's complementarity extending across 
into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to "x + 
y M , to produce polymerase chain reaction amplified fragments; and 

(h) resolving the polymerase chain reaction amplified fragments to 
5 generate a display of sequence-specific products representing the 3 f -ends of different 

mRNAs present in the mRNA population. 
A. Isolation q£ mRNA 

The first step in the method is isolation or provision of a mRNA population. 
Methods of extraction of RNA are well-known in the art and are described, for example, 

0 in J. Sambrook et al., "Molecular Cloning: A Laboratory Manual" (Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York, 1989), vol. 1, ch. 7, "Extraction, 
Purification, and Analysis of Messenger RNA from Eukaryotic Cells," incorporated 
herein by this reference. Other isolation and extraction methods are also well-known. 
Typically, isolation is performed in the presence of chaotropic agents such as 

5 guanidinium chloride or guanidinium thiocyanate, although other detergents and 
extraction agents can alternatively be used. 

Typically, the mRNA is isolated from the total extracted RNA by 
chromatography over oligo(dT)-cellulose or other chromatographic media that have the 
capacity to bind the polyadenylated 3 '-portion of mRNA molecules. Alternatively, but 
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less preferably, total RNA can be used. However, it is generally preferred to isolate 

poly(A) + RNA. 

B. Preparation o f Double-Stranded cDNA 

Double-stranded cDNAs are then prepared from the mRNA population using a 
5 mixture of anchor primers to initiate reverse transcription. The anchor primers each 
include: (i) a tract of from 7 to 40 T residues; (ii) a site for cleavage by a restriction 
endonuclease that recognizes more than six bases, the site for cleavage being located to 
'A the 5 ' -side of the tract of T residues; (iii) a first stuffer segment of from 4 to 40 

nucleotides, the first stuffer segment being located to the 5 '-side of the site for cleavage 
10 by the restriction endonuclease; (iv) a second stuffer segment of zero to eight residues 
interposed between the site for cleavage by a restriction endonuclease that recognizes 
more than six bases and the tract of T residues; and (v) phasing residues chosen from the 
group consisting of -V and -V-N located at the 3 ' end of each of the anchor primers, 
1 wherein V is a deoxyribonucleotide selected from the group consisting of A, C, and G; 

15 and N is a deoxyribonucleotide selected from the group consisting of A, C, G, and T. 

The mixture includes anchor primers containing all possibilities for V and N. Where the 
anchor primers have phasing residues of -V, the mixture comprises a mixture of three 
anchor primers. Where the anchor primers have phasing residues of -V-N, the mixture 
comprises a mixture of twelve anchor primers. 
20 Typically, the anchor primers each have 1 8 T residues in the tract of T residues, 

and the first stuffer segment of the anchor primers is 14 residues in length. A suitable 
sequence of the first stuffer segment is A-A-C-T-G-G-A-A-G-A-A-T-T-C (SEQ ID NO: 
1). Typically, the site for cleavage by a restriction endonuclease that recognizes more 
than six bases is the NotI cleavage site. One preferred set of three anchor primers has the 
25 sequence A-A-C-T-G-G-A-A-G-A-A-T-T-C-G^ 

T-T-T-T-T-T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23). Another preferred set of twelve 
anchor primers has the sequence A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C- 
A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2). 

One member of this mixture of anchor primers initiates synthesis at a fixed 
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position at the 3 '-end of all copies of each mRNA species in the sample, thereby defining 
a 3 '-end point for each species. 

This reaction is carried out under conditions for the preparation of double- 
stranded cDNA from mRNA that are well-known in the art. Such techniques are 
described, for example, in Volume 2 of J. Sambrook et al., "Molecular Cloning: A 
Laboratory Manual", entitled "Construction and Analysis of cDNA Libraries." Suitable 
reverse transcriptases include those from avian myeloblastosis virus (AMV) and 
Maloney murine leukemia virus (MMLV). A preferred reverse transcriptase is the 
MMLV reverse transcriptase. 

C. Cleavage of the cDNA Sample With Restriction Endonucleases 
The cDNA sample is cleaved with two restriction endonucleases. The first 
restriction endonuclease recognizes a site longer than six bases and cleaves at a single 
site within each member of the mixture of anchor primers. The second restriction 
endonuclease is an endonuclease that recognizes a 4-nucleotide sequence. Such 
endonucleases typically cleave at multiple sites in most cDNAs. Typically, the first 
restriction endonuclease is Not I and the second restriction endonuclease is Mspl. The 
enzyme Not I does not cleave within most cDNAs. This is desirable to minimize the loss 
of cloned inserts that would result from cleavage of the cDNAs at locations other than in 
the anchor site. 

Alternatively, the second restriction endonuclease can be Taq L Mae ll or HinPlI. 
The use of the latter two restriction endonucleases can detect mRNAs that are not 
cleaved by Mspl. The second restriction endonuclease generates a 5 '-overhang 
compatible for cloning into the desired vector, as discussed below. This cloning, for the 
vector chosen from the group consisting of pBC SK + , pBS SIC, pBS SK7DGT1, pBS 
SK7DGT2 and pBS SK7DGT3, is into the Oal site, as discussed below. 

Alternatively, other suitable restriction endonucleases can be used to detect 
cDNAs not cleaved by the above restriction endonucleases. In such embodiments, 
suitable first restriction endonucleases that recognize more than six bases are AscI, Bael, 
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Fsel, NotI, Pad, Emel, Ppu ML RsrII, Sffljl, SexAI, Sfil Sgfl, SgrAI, MI, 3se8387I and 
Swa L Suitable second restriction endonucleases recognizing a four-nucleotide sequence 
are Mbo L Dpn II. Sau3AL lsp509I, Hpa lL Bfal, Csc6I, Msel, Hhal, Malll, laql, Mspl, 

Mae ll and HinPlI. 

Conditions for digestion of the cDNA are well-known in the art and are 
described, for example, in J. Sambrook et al., "Molecular Cloning: A Laboratory 
Manual," Vol. 1, Ch. 5, "Enzymes Used in Molecular Cloning." 

Insertion of Cleaved cDNA into a Vector 

The cDNA sample cleaved with the first and second restriction endonucleases is 
then inserted into a vector. In general, a suitable vector includes a multiple cloning site 
having a NotI restriction endonuclease site. A suitable vector is the plasmid pBC SK + 
that has been cleaved with the restriction endonucleases Qal and NotI. The vector 
contains a bacteriophage-specific promoter. Typically, the promoter is a T3 promoter, a 
SP6 promoter, or a T7 promoter. A preferred promoter is bacteriophage T3 promoter. 
The cleaved cDNA is inserted into the promoter in an orientation that is antisense with 
respect to the bacteriophage-specific promoter. 

In another preferred embodiment, the vector contains at least one bacteriophage 
promoter chosen from the group consisting of T3 promoter, T7 promotor and SP6 
promotor. In one especially preferred embodiment, the vector contains a T3 promoter. In 
a preferred embodiment, the vector includes a multiple cloning site having a nucleotide 
sequence chosen from the group consisting of SEQ ID NO: 20, SEQ ID NO: 21 and 
SEQ ID NO: 22. 

Preferred vectors are based on the plasmid vector pBluescript (pBS or pBC) SK+ 
(Stratagene) in which a portion of the nucleotide sequence from positions 656 to 764 was 
removed and replaced with a sequence of at least 1 1 0 nucleotides including a £JotI 
restriction endonuclease site. This region, designated the multiple cloning site (MCS), 
spans the portion of the nucleotide sequence from the SacI site to the Kpnl site. 

A suitable plasmid vector, such as pBC SIC or pBS SK + (Stratagene ), was 
digested with suitable restriction endonuclease to remove at least 100 nucleotides of the 

-29- 



JA> 



WO 00/00646 



PCT/US99/14940 



multiple cloning site. In the case of pBS SK + , suitable restriction endonucleases for 
removing the multiple cloning site are S^cl and Kpn l. A cDNA portion comprising a 
new multiple cloning site, having ends that are compatible with NotI and Qal after 
t digestion with first and second restriction endonucleases was cloned into the vector to 

\i 5 form a suitable plasmid vector. Preferred cDNA portions comprising new multiple 

I cloning sites include those having the nucleotide sequences described in SEQ ID NO: 

20, SEQ ID NO: 21 and SEQ ID NO: 22. cDNA clones are linearized by digestion with 
3 a single restriction endonuclease that recognizes the 4-nucleotide sequence of the second 

restriction endonuclease site. 
10 A preferred plasmid vector, referred to herein as pBS SK7DGT1 , comprises the 

MCS of SEQ ID NO:20. The pairs for second restriction endonuclease and linearization 
restriction endonuclease are, respectively: Msp l and Sma l: HinPH and Narl; Ta^I and 
Xhol: Maell and AatIL 

H Another preferred plasmid vector, referred to herein as pBS SK7DGT2, 

1 5 comprises the MCS of SEQ ID NO:2 1 , and was prepared as described above for pBS 
SK + /DGT1. For pBS SK+/DGT2, the pairs for second restriction endonuclease and 
linearization restriction endonuclease are, respectively: Mspl and Smal; HinPlI and 
Nar l: and Taq I and Xho l. 

Another preferred plasmid vector, referred to herein as pBS SK7DGT3, 
20 comprises the MCS of SEQ ID NO:22. The pairs for second restriction endonuclease 

and linearization restriction endonuclease are, respectively: Msp l and Smal: HinPlI and 
Nar l: Taq I and Xho l: Mae ll and AatH. 

4 
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In a preferred embodiment the vector includes a vector stuffer sequence that 
comprises an internal vector stuffier restriction endonuclease site between the first and 
second vector restriction endonuclease sites. In one such an embodiment, the 
linearization step includes digestion of the vector with a restriction endonuclease which 
5 cleaves the vector at the internal vector stuffier restriction endonuclease site. In another 
such embodiment, the restriction endonuclease used in the linearization step also cleaves 
the vector at the internal vector stuffer restriction endonuclease site. 

£ E. Transformati on of a Suitable Host Cell 

' : ) 

The vector into which the cleaved DNA has been inserted is then used to 
1 0 transform a suitable host cell that can be efficiently transformed or transfected by the 

vector containing the insert. Suitable host cells for cloning are described, for example, in 
Sambrook et al., "Molecular Cloning: A Laboratory Manual," supra. Typically, the host 
cell is prokaryotic. A particularly suitable host cell is a strain of E^coH. A suitable YL 
coli strain is MCI 061 . Preferably, a small aliquot is also used to transform E^Qii strain 
15 XL 1 -Blue so that the percentage of clones with inserts is determined from the relative 
percentages of blue and white colonies on X-gal plates. Only libraries with in excess of 
5xl0 5 recombinants are typically acceptable. 

F. feneration of Linearized Fragments 

Plasmid preparations, typically as minipreps, are then made from each of the 
20 cDN A libraries. Lineari zed fragments are then generated by digestion with at least one 

restriction endonuclease. 

In one embodiment, vector is the plasmid pBC SK + and Mspl is used both as the 
second restriction endonuclease and as the linearization restriction endonuclease. 

In another embodiment, vector is the plasmid pBC SK + , the second restriction 
25 endonuclease is chosen from the group consisting of Msfil, Magll, Jagl and HinP 1 1 and 
the linearization is accomplished by a first digestion with Smal followed by a second 

digestion with a mixture of Kjffil and Ajjal. 

In other embodiments the vector is chosen from the group consisting of pBS SK + 
/DGT1, pBS SK* /DGT2 and pBS SK + /DGT3. In such embodiments, one suitable 
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enzyme combination is provided where the second restriction endonuclease is Msp l and 
the restriction endonuclease used in the linearization step is Sma I. Another suitable 
combination is provided where the second restriction endonuclease is TagI and the 
restriction endonuclease used in the linearization step is Xho L A further suitable 
5 combination is provided where the second restriction endonuclease is HinPH and the 
restriction endonuclease used in the linearization step is Narl. Yet another suitable 
combination is provided where the second restriction endonuclease is Mael l and the 
restriction endonuclease used in the linearization step is Aatll- In general, in the 
linearization step, described in detail in Section F, below, any plasmid vector lacking the 
10 cDNA insert was cleaved at the 6-nucleotide recognition site (underlined in Figure 3 A) 
for Sma L Narl, Xho L or MtH found between the Notl site and the Clal site. In contrast, 
plasmid vectors containing inserts would be cleaved at the 6-nucleotide recognition site 
for Sma L Narl, Xho l or AajH sites found 3'to the CM site. 

In another embodiment, an aliquot of each of the cloned inserts is divided into 
1 5 two pools, one of which is cleaved with Xhol and the second with Sail. The pools of 
linearized plasmids are combined, mixed, then divided into thirds. The thirds are 
digested with HindllL Bam HL and EcoRL This procedure is followed because, in order 
to generate antisense transcripts of the inserts with T3 RNA polymerase, the template 
must first be cleaved with a restriction endonuclease that cuts within flanking sequences 
20 but not within the inserts themselves. Given that the average length of the 3 '-terminal 
Mspl fragments is 256 base pairs, approximately 6% of the inserts contain sites for any 
enzyme with a hexamer recognition sequence. Those inserts would be lost to further 
analysis were only a single enzyme utilized. Hence, it is preferable to divide the reaction 
so that only one of either of two enzymes is used for linearization of each half reaction. 
25 Only inserts containing sites for both enzymes (approximately 0.4%) are lost from both 
| halves of the samples. Similarly, each cRN A sample is contaminated to a different 

extent with transcripts from insertless plasmids, which could lead to variability in the 
efficiency of the later polymerase chain reactions for different samples because of 
differential competition for primers. Cleavage of thirds of the samples with one of three 
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enzymes that have single targets in pBC SIC between its Gal and Motf sites eliminates 
the production of transcripts containing binding sites for the eventual 5' primers in the 
PCR process from insertless plasmids. The use of three enzymes on thirds of the 
reaction reduces the use of insert-containing sequences that also contain sites for the 

5 enzyme while solving the problem of possible contamination of insertless sequences. If 
only one enzyme were used, about 10% of the insert-containing sequences would be lost, 
but this is reduced to about 0.1%, because only those sequences that fail to be cleaved by 
all three enzymes are lost. 

G. Generation of cRNA 

10 The next step is a generation of a cRNA preparation of antisense cRNA 

transcripts. This is performed by incubation of the linearized fragments with an RNA 
polymerase capable of initiating transcription from the bacteriophage-specific promoter. 
Typically, as discussed above, the promoter is a T3 promoter, and the polymerase is 
therefore T3 RNA polymerase. The polymerase is incubated with the linearized 

15 fragments and the four ribonucleoside triphosphates under conditions suitable for 
synthesis. 
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H. Transcription of First-Strand cDNA 

In a preferred embodiment, the cRNA preparation is divided into a number of 
subpools, the number depending on the number of phasing residues of the 5'-RT primer. 
% First-strand cDNA is then transcribed from each subpool, using Maloney murine 

| 5 leukemia virus (MMLV) reverse transcriptase (Life Technologies). With this reverse 

1 transcriptase annealing is performed at 42°C, and the transcription reaction at 42°C. 

The reaction in each subpool uses one of the 5'-RT primers defined as having a 3- 
terminus consisting of -N x , wherein "N" is one of the four deoxyribonucleotides A, C, 
G, or T, and "x" is an integer from 1 to 5, the 5'-RT primer being 15 to 30 nucleotides in 
10 length and complementary to the 5' flanking vector sequence with the 5'-RT primer's 
complementarity extending across into the insert-specific nucleotides of the cRNA in a 
number of nucleotides equal to "x", wherein a different one of said 5'-RT primers is 
used in different subpools and wherein there are 4 subpools if "x" = 1,16 subpools if "x" 
= 2, 64 subpools if "x" - 3, 256 subpools if V = 4, and 1,024 subpools if V » 5. 
1 5 In another embodiment, the cRN A preparation is divided into sixteen subpools. 

First-strand cDNA is then transcribed from each subpool, using a thermostable reverse 
transcriptase and a 5'-RT primer as described below. A preferred transcriptase is the 
recombinant reverse transcriptase from Thermus thermophilus . known as rTth, available 
from Perkin-Elmer (Norwalk, CT). This enzyme is also known as an RNA-dependent 
20 DNA polymerase. With this reverse transcriptase, annealing is performed at 60 °C, and 
the transcription reaction at 70°C. This promotes high fidelity complementarity between 
the 5'-RT primer and the cRN A. The 5'-RT primer used is one of the sixteen 5'-RT 
primers whose 3'-terminus is -N-N, wherein N is one of the four deoxyribonucleotides 
A, C, G, or T, the 5'-RT primer being at least 1 5 nucleotides in length, corresponding in 
25 sequence to the 3 '-end of the bacteriophage-specific promoter, and extending across into 
at least the first two nucleotides of the cRNA. 

Where the bacteriophage-specific promoter is the T3 promoter, the 5'-RT primers 
typically have the sequence A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 
3), A-G-G^T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 5), A-G-G-T-C-G-A- 
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C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6), G-G-T-C-G-A-C-G-G-T-A-T-C-G-G- 
N (SEQ ID NO: 9), G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 7), T-C-G-A- 
C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 13), C-G-A-C-G-G-T-A-T-C-G-G-N-N- 
N-N (SEQ ID NO: 14), G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 15), or 
5 A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 24). 
I. PCR Reaction 

The next step is the use of the product of transcription in each of the subpools as a 
template for a polymerase chain reaction with primers as described below to produce 
polymerase chain reaction amplified fragments. 

10 In general, the product of first-strand cDNA transcription in each of the subpools 

is used as a template for a polymerase chain reaction with a 3-PCR primer and a 5-PCR 
primer to produce polymerase chain reaction amplified fragments. The 3'-PCR primer 
typically is 15 to 30 nucleotides in length, and is complementary to 3' flanking vector 
sequences between the first restriction endonuclease site and the site defining 

15 transcription initiation by the bacteriophage-specific promoter. The 5'-PCR primers 

have a 3 '-terminus consisting of-N x -N y , where "N" and "x" are as in reverse transcriptase 
step above, -N x is the same sequence as in the 5'-RT primer from which first-strand 
cDNA was made for that subpool, and "y" is a whole integer such that x + y equals an 
integer selected from the group consisting of 3, 4, 5 and 6, the 5' -PCR primer being 15 to 

20 30 nucleotides in length and complementary to the 5' flanking vector sequence with the 
5' -PCR primer's complementarity extending across into the insert-specific nucleotides of 
the cRNA in a number of nucleotides equal to "x + y". 

In another embodiment, the primers used are: (a) a 3 '-PCR primer that 
corresponds in sequence to a sequence in the vector adjoining the site of insertion of the 

25 cDNA sample in the vector; and (b) a 5 '-PCR primer selected from the group consisting 
of: (i)the 5'-RT primer from which first-strand cDNA was made for that subpool; 
(ii) the 5'-RT primer from which the first-strand cDNA was made for that subpool 
extended at its 3 '-terminus by an additional residue -N, where N can be any of A, C, G, 
or T; and (iii) the 5'-RT primer used for the synthesis of first-strand cDNA for that 
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subpool extended at its 3 '-terminus by two additional residues -N-N, wherein N can be 
any of A, C, G, or T. 

When the vector is the plasmid pBC SIC cleaved with Cla l and Not l. a suitable 
3'-PCR primer is G-A-A-C-A-A-A-A-G-C-T-G-G-A-G-C-T-C-C-A-C-C-G-^ (SEQ ID 
NO: 4). In another embodiment, where the vector is the plasmid pBC SK* cleaved with 
£lal and Notl, a suitable 3*-PCR primer is A-A-G-C-T-G-G-A-G-C-T-C-C-A-C-C (SEQ 
ID NO: 8). In other embodiments, suitable 3 '-PCR primers are G-A-G-C-T-C-C-A-C- 
C-G-C-G-G-T (SEQ ID NO: 18) and G-A-G-C-T-C-G-T-T-T-T-C-C-C-A-G (SEQ ID 
NO: 19). Where the bacteriophage-specific promoter is the T3 promoter, a suitable 5'- 
PCR primer can have the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N 
(SEQ ID NO: 5), A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6), T- 
C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 13), C-G-A-C-G-G-T-A-T-C-G-G- 
N-N-N-N (SEQ ID NO: 14), G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 15), 
or A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N (SEQ ID NO: 16). 

Typically, PCR is performed in the presence of 35 S-dATP using a PCR program 
of 15 seconds at 94°C for denaturation, 15 seconds at 50°C - 62°C for annealing, and 30 
seconds at 72 °C for synthesis on a Perkin-Elmer 9600 apparatus (Perkin-Elmer Cetus, 
Norwalk, CT). The annealing temperature is optimized for the specific nucleotide 
sequence of the primer, using principles well known in the art. The high temperature 
annealing step minimizes artifactual mispriming by the 5 '-primer at its 3 '-end and 
promotes high fidelity copying. 

Alternatively, the PCR amplification can be carried out in the presence of a 32 P- 
labeled or 33 P-labeled deoxyribonucleoside triphosphate, such as [ 32 P]dCTP or 
[ 33 P]dCTP. However, it is generally preferred to use a 35 S-labeled deoxyribonucleoside 
triphosphate for maximum resolution. Other detection methods, including 
nonradioactive labels, can also be used. 

The series of reactions produce a number of product pools depending on the 
chosen 5'-PCR primers. As discussed above, the 5'-PCR primers have a 3'-terminus 
consisting of -N x -N y , where "N" and "x" are as in reverse transcriptase step above, -N x is 
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the same sequence as in the 5'-RT primer from which first-strand cDNA was made for 
that subpool, and M y" ls a whole integer such that x + y equals an integer selected from 
the group consisting of 3, 4, 5 and 6, the 5'-PCR primer being 15 to 30 nucleotides in 
length and complementary to the 5' flanking vector sequence with the 5'-PCR primer's 
5 complementarity extending across into the insert-specific nucleotides of the cRNA in a 
number of nucleotides equal to "x + y M . The number of subpools is determined by x + y: 
there are 64 subpools if x + y = 3, 256 subpools if x + y = 4, 1,024 subpools if x + y = 5, 
and 4,096 subpools if x + y = 6. 

The process of the present invention can be extended by using longer sets of 5'- 
10 PCR primers extended at their 3 '-end by additional nucleotides. For example, a 5'-PCR 
primer with the 3 '-terminus -N-N-N-N-N would give 1024 products and a primer with 
the 3 '-terminus -N-N-N-N-N-N would give 4,096 products. 

J. Electrophoresis 

The polymerase chain reaction amplified fragments are then resolved by 
1 5 electrophoresis to display bands representing the 3 '-ends of mRNAs present in the 
sample. 

Electrophoretic techniques for resolving PCR amplified fragments are well- 
understood in the art and need not be further recited here. The corresponding products 
are resolved in denaturing DNA sequencing gels and visualized by autoradiography. For 

20 the particular vector system described herein, the gels are run so that the first 90 base 

pairs run off the bottom, since vector-related sequences increase the length of the cDNAs 
by 55 base pairs. This number can vary if other vector systems are employed, and the 
appropriate electrophoresis conditions so that vector-related sequences run off the bottom 
of the gels can be determined from a consideration of the sequences of the vector 

25 involved. Typically, each reaction is run on a separate denaturing gel, so that at least two 
gels are used. It is preferred to perform a series of reactions in parallel, such as from 
different tissues, and resolve all of the reactions using the same primer on the same gel. 
A substantial number of reactions can be resolved on the same gel. Typically, as many 
as thirty reactions can be resolved on the same gel and compared. As discussed below, 
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this provides a way of determining tissue-specific mRNAs. 

Typically, autoradiography is used to detect the resolved cDNA species. 
However, other detection methods, such as phosphorimaging or fluorescence, can also be 
used, and may provide higher sensitivity in certain applications. 

5 According to the scheme, the cDNA libraries produced from each of the mRNA 

samples contain copies of the extreme 3 '-ends from the most distal site for Msp l to the 
beginning of the poly (A) tail of all poly(A) + mRNAs in the starting RNA sample 
approximately according to the initial relative concentrations of the mRNAs. Because 
both ends of the inserts for each species are exactly defined by sequence, their lengths are 

1 0 uniform for each species allowing their later visualization as discrete bands on a gel, 
regardless of the tissue source of the mRNA. 

The use of successive steps with lengthening primers to survey the cDNAs 
essentially act like a nested PCR. These steps enhance quality control and diminish the 
background that potentially could result from amplification of untargeted cDNAs. In a 

1 5 preferred embodiment, the second reverse transcription step subdivides each cRNA 

sample into four subpools, utilizing a primer that anneals to the sequences derived from 
pBC SK + but extends across the CGG of the non-regenerated Mspl site and including one 
nucleotide (-N) of the insert. This step segregates the starting population of potentially 
50,000 to 100,000 mRNAs into four subpools of approximately 12,500 to 25,000 

20 members each. In serial iterations of the subsequent PCR step, in which radioactive label 
is incorporated into the products for their autoradiographic visualization, those pools are 
further segregated by division into four or sixteen subsubpools by using progressively 
longer 5 '-PCR primers containing 2-6 nucleotides of the insert. 

By first demanding by high temperature annealing a high fidelity 3 '-end match at 

25 the reverse transcription step in the -N positions, and subsequently demanding again such 
high fidelity matching into -N-N, -N-N-N, -N-N-N-N, -N-N-N-N-N or -N-N-N-N-N-N 
iterations, bleedthrough from mismatched priming at the -N positions is drastically 
minimized. 

The steps of the process beginning with dividing the cRNA preparation into four 
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subpools and transcribing first-strand cDNA from each subpool can be performed 
separately as a method of simultaneous sequence-specific identification of mRNAs 
corresponding to members of an antisense cRNA pool representing the 3 '-ends of a 
population of mRNAs, 

II. APPLICATIONS OF THE METHOD FOR DISPLAY OF mRNA PATTERNS 

The method described above for the detection of patterns of mRNA expression in 
a tissue and the resolving of these patterns by gel electrophoresis has a number of 
applications. One of these applications is its use for the detection of a change in the 
pattern of mRNA expression in a tissue associated with a physiological or pathological 
change. In general, this method comprises: 

(1) obtaining a first sample of a tissue that is not subject to the 

physiological or pathological change; 

(2) determining the pattern of mRNA expression in the first sample of the 
tissue by performing the method of simultaneous sequence-specific identification of 
mRNAs corresponding to members of an antisense cRNA pool representing the 3 '-ends 
of a population of mRNAs as described above to generate a first display of bands 
representing the 3 '-ends of mRNAs present in the first sample; 

(3) obtaining a second sample of the tissue that has been subject to the 

physiological or pathological change; 

(4) determining the pattern of mRNA expression in the second sample of 
the tissue by performing the method of simultaneous sequence-specific identification of 
mRNAs corresponding to members of an antisense cRNA pool representing the 3 '-ends 
of a population of mRNAs as described above to generate a second display of bands 
representing the 3 '-ends of mRNAs present in the second sample; and 

(5) comparing the first and second displays to determine the effect of the 
physiological or pathological change on the pattern of mRNA expression in the tissue. 

Typically, the comparison is made in adjacent lanes of a single gel. 
Typically, a database comprising the data produced by the quantitation of the 
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display of sequence-specific products is constructed and maintained using suitable 
computer hardware and computer software. Preferably, such a database further 
comprises data concerning sequence relationships, gene mapping, cellular distributions, 
and any other information considered relevant to gene function. 

5 The tissue can be derived from the central nervous system. In particular, it can be 

derived from a structure within the central nervous system that is the retina, cerebral 
cortex, olfactory bulb, thalamus, hypothalamus, anterior pituitary, posterior pituitary, 
hippocampus, nucleus accumbens, amygdala, striatum, cerebellum, brain stem, 
suprachiasmatic nucleus, or spinal cord. When the tissue is derived from the central 

1 0 nervous system, the physiological or pathological change can be any of Alzheimer f s 
disease, parkinsonism, ischemia, alcohol addiction, drug addiction, schizophrenia, 
amyotrophic lateral sclerosis, multiple sclerosis, depression, and bipolar manic- 
depressive disorder. Alternatively, the method of the present invention can be used to 
study circadian variation, aging, or long-term potentiation, the latter affecting the 

1 5 hippocampus. Additionally, particularly with reference to mRNA species occurring in 
particular structures within the central nervous system, the method can be used to study 
brain regions that are known to be involved in complex behaviors, such as learning and 
memory, emotion, drug addiction, glutamate neurotoxicity, feeding behavior, olfaction, 
viral infection, vision, and movement disorders. 

20 This method can also be used to study the results of th- administration of drugs 

and/or toxins to an individual by comparing the mRNA pattern of a tissue before and 
after the administration of the drug or toxin. Results of electroshock therapy can also be 
studied. 

Alternatively, the tissue can be from an organ or organ system that includes the 
25 cardiovascular system, the pulmonary system, the digestive system, the peripheral 

nervous system, the liver, the kidney, skeletal muscle, and the reproductive system, or 
from any other organ or organ system of the body. For example, mRNA patterns can be 
studied from liver, heart, kidney, or skeletal muscle. Additionally, for any tissue, 
samples can be taken at various times so as to discover a circadian effect of mRNA 
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expression. Thus, this method can ascribe particular mRNA species to involvement in 
particular patterns of function or malfunction. 

The antisense cRNA pool representing the 3 '-ends of mRNAs can be generated 
by steps (l)-(4) of the method as described above in Section I. 
5 Similarly, the mRNA resolution method of the present invention can be used as 

part of a method of screening for a side effect of a drug. In general, such a method 
comprises: 

(1 ) obtaining a first sample of tissue from an organism treated with a 
compound of known physiological function; 

[ 0 (2) determining the pattern of mRNA expression in the first sample of the 

tissue by performing the method of simultaneous sequence-specific identification of 
mRNAs corresponding to members of an antisense cRNA pool representing the 3 '-ends 
of a population of mRNAs, as described above, to generate a first display of bands 
representing the 3 '-ends of mRNAs present in the first sample; 

j 5 (3) obtaining a second sample of tissue from an organism treated with a 

drug to be screened for a side effect; 

(4) determining the pattern of mRNA expression in the second sample of 
the tissue by performing the method of simultaneous sequence-specific identification of 
mRNAs corresponding to members of an antisense cRNA pool representing the 3'-ends 

20 of a population of mRNAs, as described above, to generate a second display of bands 
representing the 3 '-ends of mRNAs present in the second sample; and 

(5) comparing the first and second displays in order to detect the presence 
of mRNA species whose expression is not affected by the known compound but is 
affected by the drug to be screened, thereby indicating a difference in action of the drug 

25 to be screened and the known compound and thus a side effect. 

In particular, this method can be used for drugs affecting the central nervous 
system, such as antidepressants, neuroleptics, tranquilizers, anticonvulsants, monoamine 
oxidase inhibitors, and stimulants. However, this method can in fact be used for any 
drug that may affect mRNA expression in a particular tissue. For example, the effect on 
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mRNA expression of anti-parkinsonism agents, skeletal muscle relaxants, analgesics, 
local anesthetics, cholinergics, antispasmodics, steroids, non-steroidal anti-inflammatory 
drugs, antiviral agents, or any other drug capable of affecting mRNA expression can be 
studied, and the effect determined in a particular tissue or structure. 
5 A farther application of the method of the present invention is in obtaining the 

sequence of the 3 '-ends of mRNA species that are displayed. In general, a method of 
obtaining the sequence comprises: 

(1) eluting at least one cDNA corresponding to a mRNA from an 
electropherogram in which bands representing the 3 '-ends of mRNAs present in the 

1 0 sample are displayed; 

(2) amplifying the eluted cDNA in a polymerase chain reaction; 

(3) cloning the amplified cDNA into a plasmid; 

(4) producing DNA corresponding to the cloned DNA from the plasmid; 

and 

1 5 (5) sequencing the cloned cDNA. 

The cDNA that has been excised can be amplified with the primers previously 
used in the PCR step. The cDNA can then be cloned into pCR II (Invitrogen, San Diego, 
CA) by TA cloning and ligation into the vector. Minipreps of the DNA can then be 
produced by standard techniques from subclones and a portion denatured and split into 

20 two aliquots for automated sequencing by the dideoxy chain termination method of 

Sanger. A commercially available sequencer can be used, such as a ABI sequencer, for 
automated sequencing. This will allow the determination of complementary sequences 
for most cDNAs studied, in the length range of 50-500 bp, across the entire length of the 
fragment. 

25 These partial sequences can then be used to scan genomic data bases such as 

GenBank to recognize sequence identities and similarities using programs such as 
BLASTN and BLASTX. Because this method generates sequences from only the 3'- 
ends of mRNAs it is expected that open reading frames (ORFs) would be encountered 
only occasionally, as the 3' -untranslated regions of brain mRNAs are on average longer 
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than 1300 nucleotides (J.G. Sutcliffe, supra ). Potential ORFs can be examined for 

signature protein motifs. 

The cDNA sequences obtained can then be used to design primer pairs for 
semiquantitative PCR to confirm tissue expression patterns. Selected products can also 

5 be used to isolate full-length cDNA clones for further analysis. Primer pairs can be used 
for SSCP-PCR (single strand conformation polymorphism-PCR) amplification of 
genomic DNA. For example, such amplification can be carried out from a panel of 
interspecific backcross mice to determine linkage of each PCR product to markers 
already linked. This can result in the mapping of new genes and can serve as a resource 

1 0 for identifying candidates for mapped mouse mutant loci and homologous human disease 
genes. SSCP-PCR uses synthetic oligonucleotide primers that amplify, via PCR, a small 
(100-200 bp) segment. (M. Orita et al., "Detection of Polymorphisms of Human DNA 
by Gel Electrophoresis as Single-Strand Conformation Polymorphisms," Prpc. Natl 
Acad. Sci. USA 86: 2766-2770 (1989); M. Orita et al., "Rapid and Sensitive Detection of 

1 5 Point Mutations in DNA Polymorphisms Using the Polymerase Chain Reaction," 

Genomics 5: 874-879 (1989)). 

The excised fragments of cDN A can be radiolabeled by techniques well-known in 
the art for use in probing a northern blot or for in situ hybridization to verify mRNA 
distribution and to learn the size and prevalence of the corresponding full-length mRNA. 
20 The probe can also be used to screen a cDNA library to isolate clones for more reliable 
and complete sequence determination. The labeled probes can also be used for any other 
purpose, such as studying in vitro expression. 



25 

HI. PANELS AND DEGENERATE MIXTURES OF PRIMERS 

Another aspect of the present invention is panels of primers and degenerate 
mixtures of primers suitable for the practice of the present invention. These include: 

(1) a panel of 5'-RT primers comprising 16 primers of the sequence A-G- 
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G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3), wherein N is one of the four 

deoxyribonucleotides A, C, G, or T; 

(2) a panel of 5'-PCR primers comprising 64 primers of the sequences A- 
G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 5), wherein N is one of the 

5 four deoxyribonucleotides A, C, G, or T; 

(3) a panel of 5'-PCR primers comprising 256 primers of the sequences 
A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6), wherein N is one of 
the four deoxyribonucleotides A, C, G, or T; and 

(4) a panel of S'-PCR primers comprising 1024 primers of the sequences 
10 A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N (SEQ ID NO: 24), wherein N is 

one of the four deoxyribonucleotides A, C, G, or T; and 

(5) a panel of 5'-PCR primers comprising 4096 primers of the sequences 
A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N (SEQ ID NO: 25), wherein N is 
one of the four deoxyribonucleotides A, C, G, or T; and 

15 (6) a panel of anchor primers comprising 1 2 primers of the sequences A- 

A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T- 
T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a deoxyribonucleotide selected 
from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T; 

20 (7) a panel of anchor primers comprising 3 primers of the sequences A- 

A-C-T-G-G-A-A-G-A^ 

T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23), wherein V is a deoxyribonucleotide selected 
from the group consisting of A, C, and G; 

(8) a panel of 5'-RT primers comprising 4 different oligonucleotides each 
25 having the sequence G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N (SEQ ID NO: 9), wherein 

N is one of the four deoxyribonucleotides A, C, G, or T; 

(9) a panel of 5'-RT primers comprising 16 different oligonucleotides 
each having the sequence G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 7), 
wherein N is one of the four deoxyribonucleotides A, C, G, or T; 
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(10) a panel of 5'-PCR primers comprising 64 different oligonucleotides 
each having the sequence T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 13), 
wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(1 1) a panel of 5'-PCR primers comprising 256 different oligonucleotides 
5 each having the sequence C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 14), 

wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(12) a panel of 5'-PCR primers comprising 1024 different 
oligonucleotides each having the sequence G-A-C-G-G-T-A-T-C-G-G-N-N-N-N-N 
(SEQ ID NO: 1 5), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

10 (13) a panel of 5'-PCR primers comprising 4096 different 

oligonucleotides each having the sequence A-C-G-G-T-A-T-C-G-G-N-N-N-N-N-N 
(SEQ ID NO: 16), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(14) a panel of 5'-RT primers comprising 4 different oligonucleotides 
each having the sequence C-T-T-C-A.G-T-C.A-G.G-C-T-A-A-T-C-G-G.N (SEQ ID 

15 NO: 1 0), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(15) a panel of 5'-RT primers comprising 16 different oligonucleotides 
each having the sequence T-T-C-A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N (SEQ ID 
NO: 11), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(16) a panel of 5'-PCR primers comprising 64 different oligonucleotides 
20 each having the sequence T-C-A-G-T-C-A-G-G-C-T-A-A-T-L-G-G-N-N-N (SEQ ID 

NO: 12), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(17) a panel of 5'-PCR primers comprising 256 different oligonucleotides 
each having the sequence C-A«G-T-C-A-G-G-C-T>A-A-T-C-G-G-N-N-N-N (SEQ ID 
NO: 1 7), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

25 (18) a panel of 5'-PCR primers comprising 1024 different 

oligonucleotides each having the sequence A-G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N- 
N-N (SEQ ID NO: 26), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(19) a panel of 5'-PCR primers comprising 4096 different 
oligonucleotides each having the sequence G-T-C-A-G-G-C-T-A-A-T-C-G-G-N-N-N-N- 
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N-N (SEQ ID NO: 27), wherein N is one of the four deoxyribonucleotides A, C, G, or T; 

(20) a degenerate mixture of anchor primers comprising a mixture of 3 
primers of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G- 
A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23), wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G, each of the 3 
primers being present in about an equimolar quantity; and 

(21) a degenerate mixture of anchor primers comprising a mixture of 12 
primers of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G- 
A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, each of the 12 
primers being present in about an equimolar quantity. 

The invention is illustrated by the following Example. The Example is for 
illustrative purposes only and is not intended to limit the invention. 



EXAMPLE 

Resolution of Rrain mRNAs Using P rimers Corresponding to Sequences of Known 

Brain mRNAs of Different Concentrations 
To demonstrate the effectiveness of the method of the present invention, it 
was applied using 5'-RT primers extended at their 3 '-ends by two nucleotides and 
corresponding to the sequence of known brain mRNAs of different concentrations, such 
as neuron-specific enolase (NSE) at roughly 0.5% concentration (S. Forss-Petter et al„ 
"Neuron-Specific Enolase: Complete Structure of Rat mRNA, Multiple Transcriptional 
Start Sites and Evidence for Translational Control," J. Neurosci. Res. 16: 141-156 
(1986)), RC3 at about 0.01%, and somatostatin at 0.001% (G.H. Travis & J.G. Sutcliffe, 
"Phenol Emulsion-Enhanced DNA-Driven Subtractive cDNA Cloning: Isolation of Low- 
Abundance Monkey Cortex-Specific mRNAs," Proc. Natl. Acad. Sci. USA 85: 1696- 
1700 (1988)) to compare cDNAs made from libraries constructed from cerebral cortex, 
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striatum, cerebellum and liver RNAs made as described above. On short 
autoradiographic exposures from any particular RNA sample, 50-100 bands were 
obtained. Bands were absolutely reproducible in duplicate samples. Approximately two- 
thirds of the bands differed between brain and liver samples, including the bands of the 
5 correct lengths corresponding to the known brain-specific mRNAs. This was confirmed 
by excision of the bands from the gels, amplification and sequencing. Only a few bands 
differed among samples for various brain regions for any particular 5'-RT primer, 
$ although some band intensities differed. 

The band corresponding to NSE, a relatively prevalent mRNA species, appeared 
10 in all of the brain samples but not in the liver samples, but was not observed when any of 
the last three single nucleotides within the four-base 3 '-terminal sequence -N-N-N-N was 
changed in the synthetic 5 '-primer. When the first N was changed, a small amount of 
bleedthrough is detected. For the known species, the intensity of the autoradiographic 
signal was roughly proportional to mRNA prevalence, and mRNAs with concentrations 
1 5 of one part in 1 0 5 or greater of the poly(A) + RNA were routinely visible, with the 
occasional problem that cDNAs that migrated close to more intense bands were 
obscured. 

A sample of the data is shown in Figure 2. In the 5 gel lanes on the left, cortex 
cRNA was substrate for reverse transcription with the 5'-RT primer A-G-G-T-C-G-A-C- 
20 G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3) where -N-N is -C-T (primer 118), -G-T 

(primer 1 16) or -C-G (primer 106). The PCR amplification used 5'-PCR primers A-G- 
G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6) where -N-N-N-N is -C-T- 
A-C (primer 128), -C-T-G-A (primer 127), -C-T-G-C (primer 1 1 1), -G-T-G-C (primer 
134), and -C-G-G-C (primer 130), as indicated in Figure 2. Primers 1 18 and 1 1 1 match 
25 the sequence of the two and four nucleotides, respectively, downstream from the Msgl 
site located the nearest the 3'-end of the NSE mRNA sequence. Primer 127 is 
mismatched with the NSE sequence in the last (-1) position, primer 128 in the next-to- 
last (-2) position, primers 106 and 130 in the -3 position, and primers 1 16 and 134 in the 
-4 position. Primer 134 extended two nucleotides further upstream than the others shown 
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here, hence its PCR products are two nucleotides longer relative to the products in other 
lanes. 

In each lane, 50-100 bands were visible in 15-minute exposures using 32 P-dCTP 
to radiolabel the products. These bands were apparently distinct for each primer pair, 
5 with the exception that a subset of the 1 1 8-1 1 1 bands appeared more faintly in the 

1 16-134 lane, trailing by two nucleotides, indicating bleedthrough in the four position. 

The 118-111 primer set was used again on separate cortex (CX) and liver (LV) 
cRNAs. The cortex pattern was identical to that in lane 118-111, demonstrating 
reproducibility. The liver pattern differed from CX in the majority of species. The 
10 asterisk indicates the position of the NSE product. Analogous primer sets detected RC3 
and somatostatin (somat) products (asterisks) in CX but not LV lanes. The relative band 
intensities of a given PCR product can be compared within lanes using the same primer 
set, but not different sets. 

This example demonstrates the feasibility and reproducibility of the method of 
1 5 the present invention and its ability to resolve different mRNAs. It further demonstrates 
that prevalence of particular mRNA species can be estimated from the intensity of the 
autoradiographic signal. The assay allows mRNAs present in both high and low 
prevalence to be detected simultaneously. 
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ADVANTAGES OF TH E PRESENT INVENTION 
The present method can be used to identify genes whose expression is altered 
during neuronal development, in models of plasticity and regeneration, in response to 
chemical or electrophysiological challenges such as neurotoxicity and long-term 
5 potentiation, and in response to behavioral, viral, drug/alcohol paradigms, the occurrence 
of cell death or apoptosis, aging, pathological conditions, and other conditions affecting 
mRNA expression. Although the method is particularly useful for studying gene 
expression in the nervous system, it is not limited to the nervous system and can be used 
to study mRNA expression in any tissue. The method allows the visualization of nearly 
10 every mRNA expressed by a tissue as a distinct band on a gel whose intensity 
corresponds roughly to the concentration of the mRNA. 

The method has the advantage that it does not depend on potentially 
irreproducible mismatched random priming, so that it provides a high degree of accuracy 
and reproducibility. Moreover, it reduces the complications and imprecision generated 
1 5 by the presence of concurrent bands of different length resulting from the same mRNA 
species as the result of different priming events. In methods using random priming, such 
concurrent bands can occur and are more likely to occur for mRNA species of high 
prevalence. In the present method, such concurrent bands are avoided. 

The method provides sequence-specific information about the mRNA species and 
20 can be used to generate primers, probes, and other specific sequences. 

Although the present invention has been described in considerable detail, with 
reference to certain preferred versions thereof, other versions are possible. Therefore, the 
spirit and scope of the appended claims should not be limited to the description of the 
preferred versions contained herein. 
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We claim: 

w 

xxx 1 . A method for simultaneous sequence-specific identification of mRNAs in 
an mRNA population comprising the steps of: 

(a) preparing a double-stranded cDNA population from an mRNA 
population using a mixture of anchor primers, the anchor primers each including: (i) a 
tract of from 7 to 40 T residues; (ii) a site for cleavage by a first restriction endonuclease 
that recognizes more than six bases, the site for cleavage being located to the 5'-side of 
the tract of T residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the first 
stuffer segment being located to the 5'-side of the site for cleavage by the first restriction 
endonuclease; and (iv) phasing residues located at the 3' end of each of the anchor 
primers selected from the group consisting of -V and -V-N, wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for V and N where the phasing 
residues in the mixture are defined by one of -V or - V-N; 

(b) cleaving the double-stranded cDNA population with the first 
restriction endonuclease and with a second restriction endonuclease, the second 
restriction endonuclease recognizing a four-nucleotide sequence, to form a population of 
double-stranded cDNA molecules having first and second termini, respectively; 

(c) inserting the double-stranded cDNA molecules from step (b) each 
into a vector in an orientation that is antisense with respect to a bacteriophage-specific 
promoter within the vector to form a population of vectors containing the inserted cDNA 
molecules, said inserting defining 3' and 5' flanking vector sequences such that 5* is 
upstream from the sense strand of the inserted cDNA and 3' is downstream of the sense 
strand, and said vector having a 3' flanking nucleotide sequence of from at least 15 
nucleotides in length between said first restriction endonuclease site and a site defining 
transcription initiation in said promoter; 

(d) generating linearized fragments containing the inserted cDNA 
molecules by digestion of the vectors produced in step (c) with at least one restriction 
endonuclease that does not recognize sequences in the inserted cDNA molecules or in the 
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bacteriophage-specific promoter, but does recognize sequences in the vector such that the 
resulting linearized fragments have a 5' flanking vector sequence of at least 15 
nucleotides 5' to the site of insertion of the cDNA sample into the vector at the cDNA's 
second terminus; 

5 (e) generating a cRNA preparation of antisense cRNA transcripts by 

incubation of the linearized fragments with a bacteriophage-specific RNA polymerase 
capable of initiating transcription from the bacteriophage-specific promoter; 

(f) dividing the cRNA preparation into subpools and transcribing first- 
strand cDNA from each subpool, using a reverse transcriptase and one of the 5'-RT 

10 primers defined as having a 3'-terminus consisting of -N x , wherein "N" is one of the four 
deoxyribonucleotides A, C, G, or T, and V is ar integer from 1 to 5, the 5'-RT primer 
being 15 to 30 nucleotides in length and complementary to the 5' flanking vector 
sequence with the primer's complementarity extending across into the insert-specific 
nucleotides of the cRNA in a number of nucleotides equal to V\ wherein a different one 

'i 

15 of said 5'-RT primers is used in different subpools and wherein there are 4 subpools if 
"x" = 1,16 subpools if V = 2, 64 subpools if "x" = 3, 256 subpools if V = 4, and 1,024 

subpools if "x" = 5; 

(g) using the product of first-strand cDNA transcription in each of the 
subpools as a template for a polymerase chain reaction with a 3'-PCR primer of 15 to 30 

20 nucleotides in length that is complementary to 3' flanking vector sequences between said 
first restriction endonuclease site and the site defining transcription initiation by the 
bacteriophage-specific promoter and a 5'-PCR primer having a 3'-terminus consisting of 
-N -N , where "N" and V are as in step (f), -N x is the same sequence as in the 5'-RT 

x y 

* primer from which first-strand cDNA was made for that subpool, and "y" is a whole 

25 integer such that x + y equals an integer selected from the group consisting of 3, 4, 5 and 
6, the 5'-PCR primer being 15 to 30 nucleotides in length and complementary to the 5' 
flanking vector sequence with the 5'-PCR primer's complementarity extending across 
into the insert-specific nucleotides of the cRNA in a number of nucleotides equal to "x + 
y", to produce polymerase chain reaction amplified fragments; and 
30 (h) resolving the polymerase chain reaction amplified fragments to 
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generate a display of sequence-specific products representing the 3'-ends of different 
mRNAs present in the mRNA population. 

2. The method of claim 1 wherein the phasing residues in step (a) are -V-N. 

3. The method of claim 1 wherein the phasing residues in step (a) are -V. 

4. The method of claim 1 wherein the "x" in step (0 is 2. 

5. The method of claim 1 wherein the "x" in step (f) is 1. 

6. The method of claim 5 wherein the "y" in step (g) is 3. 

7. The method of claim 5 wherein the "y" in step (g) is 4. 

8. The method of claim 1 wherein the phasing residues in step (a) are -V-N, the 

V in step (f) is 2, and the "y" in step (g) is 2. 

9. The method of claim 1 wherein the phasing residues in step (a) are -V-N, the 
"x" in step (f) is 1, and the "y" in step (g) is 3. 

10. The method of claim 1 wherein the phasing residues in step (a) are -V-N, 
the "x" in step (f) is 1, and the "y" in step (g) is 4. 

1 1 . The method of claim 1 wherein the phasing residues in step (a) are -V, the 

V in step (f) is 1, and the "y" in step (g) is 3. 

12. The method of claim 1 wherein the phasing residues in step (a) are -V, the 

V in step (0 is 1, and the n y n in step (g) is 4. 

13. The method of claim 1 wherein the anchor primers each have 18 T residues 

in the tract of T residues. 

14. The method of claim 1 wherein the first stuffer segment of the anchor 

primers is 14 residues in length. 

15. The method of claim 1 wherein the sequence of the first stuffer segment is 
A-A-C-T-G-G-A-A-G-A-A-T-T-C (SEQ ID NO: 1). 

16. The method of claim 1 wherein a second stuffer segment is interposed 
between the site for cleavage by a restriction endonuclease that recognizes more than six 
bases and the tract of T residues. 

17. The method of claim 1 wherein the anchor primers have the sequence 
A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T- 

T-T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2). 
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18. The method of claim 1 wherein the anchor primers have the sequence 

A-A-C-T-G-G-A-^ 
T-T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23). 

19. The method of claim 1 wherein the bacteriophage-specific promoter is 
5 selected from the group consisting of T3 promoter, T7 promoter and SP6 promoter. 

20. The method of claim 16 wherein the bacteriophage-specific promoter is T3 

promoter. 

21. The method of claim 1 wherein the sixteen 5'-RT primers for priming of 
transcription of cDNA from cRNA have the sequence A-G-G-T-C-G-A-C-G-G-T-A-T- 

10 C-G-G-N-N (SEQ ID NO: 3). 

22. The method of claim 1 wherein the sixteen 5 ? -RT primers for priming of 
transcription of cDNA from cRNA have the sequence G-T-C-G-A-C-G-G-T-A-T-C-G- 

G-N-N (SEQ ID NO: 7). 

23. The method of claim 1 wherein the vector is chosen from the group 

15 consisting of pBC SK + , pBS SIC, pBS SK7DGT1, pBS SK7DGT2 and pBS SKVDGT3. 

24. The method of claim 1 wherein the vector is the plasmid pBC SK+ 
cleaved with CM and NotI and the 3'-PCR primer in step (g) is G-A-A-C-A-A-A-A-G- 
C-T-G-G-A-G-C-T-C-C-A-C-C-G-C (SEQ ID NO: 4). 

25. The method of claim 1 wherein the vector is the plasmid pBC SK+ cleaved 
20 with Clal and NotI and the 3'-PCR primer in step (g) is A-A-G-C-T-G-G-A-G-C-T-C-C- 

A-C-C (SEQ ED NO: 8). 

26. The method of claim 1 wherein the second restriction endonuclease 

recognizing a four-nucleotide sequence is Mspl. 

27. The method of claim 1 wherein the second restriction endonuclease 

25 recognizing a four-nucleotide sequence is selected from the group consisting of MMl 
Dpn IL Sau3Al Tsp509I, Heall, Bfal, Cse6I, Msel, Hhal, Nhin, lagl, Msg, Maell and 
HinPlI. 

28. The method of claim 1 wherein the first restriction endonuclease that 
recognizes more than six bases is selected from the group consisting of AscI, Bael, Fsel, 

30 NotI, Pad, Pmel, PpuM L Rsifl, Sa^I, SexAI, Sfil, Sgfl, SgrAI, Srfl, Sse8387I and Swal. 
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29. The method of claim 1 wherein the first restriction endonuclease that 
recognizes more than six bases is Notl. 

30. The method of claim 1 wherein the restriction endonuclease used in step (d) 
has a nucleotide sequence recognition that includes the four-nucleotide sequence of the 

5 second restriction endonuclease used in step (b). 

3 1 . The method of claim 30 wherein the second restriction endonuclease is 
Mspl and the restriction endonuclease used in step (d) is Sma I. 

32. The method of claim 30 wherein the second restriction endonuclease is TagI 
and the restriction endonuclease used in step (d) is XhoL 

10 33. The method of claim 30 wherein the second restriction endonuclease is 

HinPH and the restriction endonuclease used in step (d) is Narl. 

34. The method of claim 30 wherein the second restriction endonuclease is 
Mae ll and the restriction endonuclease used in step (d) is AatH. 

35. The method of claim 1 wherein the vector of step (c) is in the form of a 
15 circular DNA molecule having first and second vector restriction endonuclease sites 

flanking a vector stuffer sequence, and further comprising the step of digesting the vector 
with restriction endonucleases that cleave the vector at the first and second vector 
restriction endonuclease sites. 

36. The method of claim 35 wherein the vector stuffer sequence includes an 
20 internal vector stuffer restriction endonuclease site between the first and second vector 

restriction endonuclease sites. 

37. The method of claim 36 wherein the step (d) includes digestion of the 
vector with a restriction endonuclease which cleaves the vector at the internal vector 
stuffer restriction endonuclease site. 

25 38. The method of claim 37 wherein the restriction endonuclease used in step 

(d) also cleaves the vector at the internal vector stuffer restriction endonuclease site. 

39. The method of claim 1 wherein the step of generating linearized fragments 
of the cloned inserts comprises: (i) dividing the plasmid containing the insert into two 
fractions, a first fraction cleaved with the restriction endonuclease Xhol and a second 

30 fraction cleaved with the restriction endonuclease Sail; (ii) recombining the first and 
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second fractions after cleavage; (iii) dividing the recombined fractions into thirds and 
cleaving the first third with the restriction endonuclease Hmdin, the second third with 
the restriction endonuclease BamHI, and the third with the restriction endonuclease 
EcoRI; and (iv) recombining the thirds after digestion in order to produce a population of 
5 linearized fragments of which about one-sixth of the population corresponds to the 
product of cleavage by each of the possible combinations of enzymes. 

40. The method of claim 1 wherein the mRNA population has been enriched for 

polyadenylated mRNA species. 

4 1 . The method of claim 1 wherein the resolving of the amplified fragments in 
10 step (h) is conducted by electrophoresis to display the products. 

42. The method of claim 41 wherein the intensity of products displayed after 
electrophoresis is about proportional to the abundances of the mRNAs corresponding to 
the products in the original mixture. 

43. The method of claim 41 further comprising a step of determining the 
15 relative abundance of each mRNA in the original mixture from the intensity of the 

product corresponding to that mRNA after electrophoresis. 

44. The method of claim 41 wherein the step of resolving the polymerase chain 
reaction amplified fragments by electrophoresis comprises electrophoresis of the 
fragments on at least two gels. 

20 45. The method of claim 1 wherein the suitable host cell is Escherichia coli. 

46. The method of claim 41 further comprising the steps of: 

(i) eluting at least one cDNA corresponding to a mRNA from an 
electropherogram in which bands representing the 3'-ends of mRNAs present in the 
sample are displayed; 
25 (j) amplifying the eluted cDNA in a polymerase chain reaction; 

(k) cloning the amplified cDNA into a plasmid; 
(1) producing DNA corresponding to the cloned DNA from the plasmid; 

and 

(m) sequencing the cloned cDNA. 
30 47. A method for simultaneous sequence-specific identification of mRNAs in a 
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mRNA population comprising the steps of: 

(a) isolating an mRNA population; 

(b) preparing a double-stranded cDNA population from an mRNA 

8 population using a mixture of anchor primers, the anchor primers each including: (i) a 

-.j 5 tract of from 7 to 40 T residues; (ii) a site for cleavage by a first restriction endonuclease 

I 

% that recognizes eight bases, the site for cleavage being located to the 5'-side of the tract of 

I 

T residues; (iii) a first stuffer segment of from 4 to 40 nucleotides, the stuffer segment 
being located to the 5'-side of the site for cleavage by the first restriction endonuclease; 
and (iv) phasing residues chosen from the group consisting of -V and -V-N located at 
10 the 3' end of each of the anchor primers, wherein V is a deoxyribonucleotide selected 
from the group consisting of A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, the mixture including anchor primers containing 

all possibilities for V and N; 
I (c) cleaving the double-stranded cDNA population with the first 

15 restriction endonuclease and with a second restriction endonuclease recognizing a four- 
nucleotide sequence, to form a population of double-stranded cDNA molecules having 
first and second termini respectively; 

i 

(d) inserting the double-stranded cDNA molecules from step (b) each into 
a vector in an orientation that is antisense with respect to a T3 promoter within the vector 

20 to form a population of vectors containing the inserted cDNA molecules, the inserting 
defining 3' and 5' flanking vector sequences such that 5' is upstream from the sense 
strand of the inserted cDNA and 3' is downstream of the sense strand, and the vector 
having a 3' flanking nucleotide sequence of from at least 15 nucleotides in length 
between the first restriction endonuclease site and a site defining transcription initiation 

25 in the promoter; 

(e) transforming Escherichia coli with the vector into which the cleaved 

cDNA has been inserted to produce vectors containing cloned inserts; 

(f) generating linearized fragments containing the inserted cDN A 
molecules by digestion of the vectors produced in step (e) with at least one restriction 

30 endonuclease that does not recognize sequences in the inserted cDNA molecules or in the 



4 
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T3 promoter; 

(g) generating a cRNA preparation of antisense cRNA transcripts by 
incubation of the linearized fragments with a T3 RNA polymerase capable of initiating 
transcription from the T3 promoter; 
5 (h) dividing the cRNA preparation into subpools and transcribing first- 

strand cDNA from each subpool, using a reverse transcriptase and one of the 5'-RT 
primers defined as having a 3'-terminus consisting of -N x , wherein "N" is one of the four 
deoxyribonucleotides A, C, G, or T, and V is an integer from 1 to 2, the 5'-RT primer 
being 15 to 30 nucleotides in length and complementary to the 5* flanking vector 
10 sequence with the 5'-RT primer's complementarity extending across into the insert- 
specific nucleotides of the cRNA in a number of nucleotides equal to "x\ wherein a 
different one of the primers is used in different subpools and wherein there are 4 
subpools if "x" = 1 and 16 subpools if V = 2; 

(i) using the product of first-strand cDNA transcription in each of the 
15 subpools as a template for a polymerase chain reaction with a 3-PCR primer of 15 to 30 
nucleotides in length that is complementary to 3 ? flanking vector sequences between the 
first restriction endonuclease site and the site defining transcription initiation by the T3 
promoter and a 5'-PCR primer having a 3'- terminus consisting of -N x -N y , where M N" 
and V are as in step (h), -N x is the same sequence as in the 5'-RT primer from which 
20 first-strand cDNA was made for that subpool, and "y" is a whole integer such that x + y 
equals an integer selected from the group consisting of 4 and 5, the 5'-PCR primer being 
15 to 30 nucleotides in length and complementary to the 5' flanking vector sequence with 
the 5'-PCR primer's complementarity extending across into the insert-specific 
nucleotides of the cRNA in a number of nucleotides equal to "x + y", to produce 
25 polymerase chain reaction amplified fragments; and 

(j) resolving the polymerase chain reaction amplified fragments to 
generate a display of products representing the 3'-ends of different mRNAs present in the 
mRNA population. 

48. The method of claim 47 wherein each of the primers in the mixture of 
30 anchor primers have the sequence A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G- 
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C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2). 

49. The method of claim 47 wherein each of the primers in the mixture of 
anchor primers have the sequence A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G- 
C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V (SEQ ID NO: 23). 

50. The method of claim 47 wherein the first restriction endonuclease is Mspl 
and the second restriction endonuclease is Notl. 

51. The method of claim 47 wherein the 5'-RT primer in step (h) is G-G-T-C- 
G-A-C-G-G-T-A-T-C-G-G-N (SEQ ID NO: 9). 

52. The method of claim 47 wherein the 3 -PCR primer is A-A-G-C-T-G-G-A- 
G-C-T-C-C-A-C-C (SEQ ID NO: 8). 

53. The method of claim 47 wherein the "y" in step (i) is 3. 

54. The method of claim 47 wherein the "y" in step (i) is 4. 

55. A method for detecting a change in the pattern of mRNA expression in a 
tissue associated with a physiological or pathological change comprising the steps of. 

(a) obtaining a first sample of a tissue that is not subject to the 
physiological or pathological change; 

(b) isolating an mRNA population from the first sample; 

(c) determining the pattern of mRNA expression in the first sample of the 
tissue by performing steps (a)-(h) of claim 1 to generate a first display of sequence- 
specific products representing the 3'-ends of mRNAs present in the first sample; 

(d) obtaining a second sample of the tissue that has been subject to the 
physiological or pathological change; 

(e) isolating an mRNA population from the second sample; 

(f) determining the pattern of mRNA expression in the second sample of 
the tissue by performing steps (a)-(h) of claim 1 to generate a second display of 
sequence-specific products representing the 3'-ends of mRNAs present in the second 
sample; and 

(g) comparing the first and second displays to determine the effect of the 
physiological or pathological change on the pattern of mRNA expression in the tissue. 

56. The method of claim 55 wherein the tissue is derived from the central 
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nervous system. 

57. The method of claim 56 wherein the physiological or pathological change is 
selected from the group consisting of Alzheimer's disease, parkinsonism, ischemia, 
alcohol addiction, drug addiction, schizophrenia, amyotrophic lateral sclerosis, multiple 
sclerosis, depression, and bipolar manic-depressive disorder. 

58. The method of claim 56 wherein the physiological or pathological change is 
associated with learning or memory, emotion, glutamate neurotoxicity, feeding behavior, 
olfaction, vision, movement disorders, viral infection, electroshock therapy, or the 
administration of a drug or toxin. 

59. The method of claim 56 wherein the physiological or pathological change is 
selected from the group consisting of circadian variation, aging, and long term 
potentiation. 

60. The method of claim 56 wherein the tissue is derived from a structure 
within the central nervous system selected from the group consisting of retina, cerebral 
cortex, olfactory bulb, thalamus, hypothalamus, anterior pituitary, posterior pituitary, 
hippocampus, nucleus accumbens, amygdala, striatum, cerebellum, brain stem, 
suprachiasmatic nucleus, and spinal cord. 

61. The method of claim 55 wherein the tissue is from an organ or organ 
system selected from the group consisting of the cardiovascular system, the pulmonary 
system, the digestive system, the peripheral nervous system, the liver, the kidney, skeletal 
muscle, and the reproductive system. 

62. A method of detecting a difference in action of a drug to be screened and a 
known compound comprising the steps of- 

(a) obtaining a first sample of tissue from an organism treated with a 
compound of known physiological function; 

(b) isolating an mRNA population from the first sample; 

(c) determining the pattern of mRNA expression in the first sample of the 
tissue by performing steps (a)-(h) of claim 1 to generate a first display of sequence- 
specific products representing the 3'-ends of mRNAs present in the first sample; 

(d) obtaining a second sample of tissue from an organism treated with a 
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drug to be screened for a difference in action of the drug and the known compound; 

(e) isolating an mRNA population from the first sample; 

(f) determining the pattern of mRNA expression in the second sample of 
the tissue by performing steps (a)-(h) of claim 1 to generate a second display of 
sequence-specific products representing the 3'-ends of mRNAs present in the second 
sample; and 

(g) comparing the first and second displays in order to detect the presence 
of mRNA species whose expression is not affected by the known compound but is 
affected by the drug to be screened, thereby indicating a difference in action of the drug 
to be screened and the known compound. 

63. The method of claim 62 wherein the drug to be tested is selected from the 
group consisting of antidepressants, neuroleptics, tranquilizers, anticonvulsants, 
monoamine oxidase inhibitors, and stimulants. 

64. The method of claim 62 wherein the drug to be tested is selected from the 
group consisting of anti-parkinsonism agents, skeletal muscle relaxants, analgesics, local 
anesthetics, cholinergics, antiviral agents, antispasmodics, steroids, and non-steroidal 

anti-inflammatory drugs. 

65. A degenerate mixture of anchor primers comprising a mixture of 12 primers 
of the sequences A-A-C-T-G-G-A-A-G-A-A-T^T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T- 
T-T-T-T-T-T-T-T T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of A, C, and G; and N is a 
deoxyribonucleotide selected from the group consisting of A, C, G, and T, each of the 12 
primers being present in about an equimolar quantity. 

66. A database comprising the data produced by the quantitation of the 
display of sequence-specific products of claim 1 to yield a digital address, said digital 
address being defined as a sequence identifier, the length of the sequence-specific PCR 
product in nucleotide residues and the intensity of labeling of the PCR product, defined 
as the area under the peak of the detector output for that PCR product. 

67. The database of claim 66 further comprising data selected from the group 
consisting of sequence relationships, gene mapping and cellular distributions. 
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68. The method of claim 1 further comprising the steps of : 

(i) eluting at least one cDNA corresponding to a mRNA from an 
electropherogram in which bands representing the 3 ' -ends of mRNAs present in the 

'% 

sample are displayed; 

jj 5 (j) amplifying the eluted cDNA in a polymerase chain reaction; 

% (k) cloning the amplified cDNA into a plasmid; 

(1) producing DNA corresponding to the cloned DNA from the plasmid; 

■A (m) determining the sequence of the cloned cDNA; 

•) 

(n) determining corresponding nucleotide sequences from a database of 
10 nucleotide sequences, said corresponding nucleotide sequences being delimited by the 
most distal recognition site for the second endonuclease and the beginning of the poly(A) 
tail; and 

(o) comparing the sequence of the cloned cDNA to the corresponding 
* nucleotide sequences, thereby recognizing sequence identities and similarities between 

15 the sequence of 3-ends of mRNA molecules present in a sample and a database of 
sequences. 

69. The method of claim 68 further comprising the step of ; 

(p) comparing the length and amount of the PCR products in a two 
dimensional graphical display. 
20 70. The method of claim 69 further comprising th^ steps of : 

(q) determining the expected length of the corresponding nucleotide 
sequence, which is equal to the sum of the lengths of the corresponding nucleotide 
sequence determined from the database, the length of the 5'PCR sequence hybridizable to 
« vector sequence, the length of the remaining anchor primer sequence, an intervening 

25 segment of vector sequence and the length of the 3'PCR sequence hybridizable to vector 
| sequence; and 

(r) comparing the length of the PCR product to the determined expected 
length of the corresponding nucleotide sequence, wherein the expected length of 
corresponding nucleotide sequence is indicated in the two dimensional graphical display 
30 by the use of a graphical symbol or text character. 
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7 L A method for recognizing sequence identities and similarities between the 
sequence of a cDNA fragment corresponding to a mRNA molecule present in a sample 
and a database of sequences, comprising the steps of: 

eluting a cDNA fragment corresponding to a mRNA molecule present in a 
J 5 sample; 

| amplifying the eluted cDNA fragment in a polymerase chain reaction to produce 

amplified cDNA fragment; 
3 cloning the amplified cDNA fragment into a plasmid; 

producing a DNA molecule corresponding to the cloned cDNA fragment; 
10 sequencing the produced DNA molecule, thereby determining the sequence of the 

eluted cDNA fragment; and 

comparing the sequence of the eluted cDNA fragment to the sequences in a 
database thereby recognizing sequence identities and similarities. 
* 72. The method of claim 71 wherein the step of comparing the sequence of 

15 the eluted cDNA fragment to the sequences in a database is performed using a computer. 

73. The method of claim 72 comprising the additional step of displaying the 
results of the comparison graphically. 

74. A method for recognizing sequence identities and similarities between the 
sequence of a cDNA fragment corresponding to a mRNA molecule present in a sample 

20 and a database of sequences, comprising the steps of : 

eluting a cDNA fragment corresponding to a mRNA molecule present in a 
sample, where the cDNA fragment has a length determined by the position of a 
restriction endonuclease recognition site and a poly(A) tail of the mRNA molecule; 
£ determining a partial sequence of the cDNA fragment by performing a 

25 polymerase chain reaction with a 5' PCR primer corresponding to the sequence of the 
i restriction endonuclease recognition site and 

comparing the determined partial sequence of the eluted cDNA fragment and the 
length of the cDNA fragment to the sequences in a database thereby recognizing 
sequence identities and similarities. 
30 75. A method of producing a transformed polynucleotide sequence database 

n 
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entry, comprising the steps of: 

choosing a source sequence from a polynucleotide sequence database entry; 

locating a poly(A) tail sequence within the source sequence; 

locating an endonuclease recognition site sequence within the source sequence 
5 that is closest to the first recognition site; 

determining an index sequence consisting of about two to about six nucleotides 
adjacent to the endonuclease recognition site; 

determining a correlate sequence within the source sequence, said correlate 
sequence including the sequence bounded by the poly(A) tail and the endonuclease 
10 recognition site and including at least part of the endonuclease recognition site; 

determining the length of the correlate sequence; and 

storing information concerning the location and sequence of the poly(A) tail, the 
location and sequence of the endonuclease recognition site, and the length of the 
correlate sequence in relation to the source sequence, thereby producing a transformed 

15 database entry. 

76. The method of claim 75 further comprising the step of: 

displaying graphically the length of the correlate sequence in relation to the index 
sequence. 

77. The method of claim 76 wherein the restriction endonuclease is chosen 
20 from the group consisting of Mspl. Taql and HinPl I. 

78. A method of improving resolution of the length and amount of PCR 
products by diminishing background that is due to amplification of untargeted cDNAs 

comprising the steps of: 

selecting a sample of a cRNA population, wherein each cRNA molecule 
25 comprises insert sequence and vector-derived sequence; 

performing reverse transcription using a reverse transcription primer that 
hybridizes to the vector-derived sequence and that extends about five nucleotides to 
about six nucleotides into the insert sequence to produce a cDNA reverse transcription 
product; 

30 subdividing the cDNA reverse transcription product; 
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performing at least one polymerase chain reaction using the subdivided cDNA 
reverse transcription product, a 3'PCR primer and a 5' PCR primer that hybridizes to the 
vector-derived sequence and extends about seven nucleotides to about nine nucleotides 
into the insert sequence to produce a PCR product, thereby diminishing background that 
5 is due to amplification of untargeted cDNAs. 

79. The method of claim 78 wherein there are sixteen pools of reverse 
transcription reactions and there are 16 different reverse transcription primers. 

80. The method of claim 79 wherein there are 4 X subpools of polymerase 
chain reactions, where X is the difference between the number of nucleotides that the 5* 

10 PCR primer extends into the insert sequence and the number of nucleotides that the 
reverse transcription primer extends into the insert sequence. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

5 (i) APPLICANTS: Sutcliffe, J. G. 

Erlander, Mark G. 
Hasel, Karl W. 

(ii) TITLE OF INVENTION: Method for Simultaneous 
10 Identification of Differentially Expressed mRNAs and 

Measurement of Relative Concentrations 

(iii) NUMBER OF SEQUENCES: 27 

15 (iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: The Scripps Research Institute 

(B) STREET: 10550 North Torrey Pines Road 

(C) CITY: La Jolla 

(D) STATE: California 
20 (E) COUNTRY: USA 

(F) ZIP: 92037 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

25 (B) COMPUTER: IBM PC compatible 

<C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 
30 (A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 



35 (viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Fitting, Thomas 

(B) REGISTRATION NUMBER: 34,163 

(C) REFERENCE/ DOCKET NUMBER: TSRI 401.1 CIP 

40 (ix) TELECOMMUNICATION INFORMATION: 
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(A) TELEPHONE: (858) 784-2937 

(B) TELEFAX: (858) 784-9399 



H (2) INFORMATION FOR SEQ ID NO:l: 



> (i) SEQUENCE CHARACTERISTICS: 

| (A) LENGTH: 14 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



! (vi) ORIGINAL SOURCE: 



20 



(A) ORGANISM: Synthetic primer 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
AACTGGAAGA ATTC 14 
25 (2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 
30 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

.-V 

(ii) MOLECULE TYPE: DNA (genomic) 

"3 35 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

40 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
AACTGGAAGA ATTCGCGGCC GCAGGAATTT TTTT TTTTTT TTTTTVN 4 7 

5 (2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 
20 (A) ORGANISM: Synthetic primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 3: 

25 AGGTCGACGG TATCGGNN 18 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

40 
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15 



30 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GAACAAAAGC TGGAGCTCCA CCGC 24 



(2) INFORMATION FOR SEQ ID NO: 5: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

20 (iv) ANTI-SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

AGGTCGACGG TATCGGNNN 19 



(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



40 
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(iv) ANTI-SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

AGGTCGACGG TATCGGNNNN 20 



(2) INFORMATION FOR SEQ ID NO:7 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 16 base pairs 
15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE : NO 

25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

30 GTCGACGGTA TCGGNN 16 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



I 0000646A1_IA> 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

5 (iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AAGCTGGAGC TCCACC 16 



15 



25 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GGTCGACGGT ATCGGN 16 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 



<WO 0000646A1_IA> 



as 



4 



10 



15 
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(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CTTCAGTCAG GCTAATCGGN 20 



] (2) INFORMATION FOR SEQ ID NO: 11: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
25 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



30 



(iv) ANTI-SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

TTCAGTCAGG CTAATCGGNN 20 

(2) INFORMATION FOR SEQ ID NO: 12: 

40 (i) SEQUENCE CHARACTERISTICS: 



35 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

5 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
10 (iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TCAGTCAGGC TAATCGGNNN 20 



20 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TCGACGGTAT CGGNNN 16 



40 (2) INFORMATION FOR SEQ ID NO: 14: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



15 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 



CGACGGTATC GGNNNN 



16 



20 



(2) INFORMATION FOR SEQ ID NO: 15: 



25 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



30 



(ii) MOLECULE TYPE: DNA (genomic) 



35 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE : NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
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GACGGTATCG GNNNNNN 16 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

15 (iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

ACGGTATCGG NNNNNN 16 



10 



25 



35 



(2) INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM*. Synthetic primer 



40 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CAGTCAGGCT AATCGGNNNN 20 

(2) INFORMATION FOR SEQ ID NO: 18: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 



20 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



GAGCTCCACC GCGGT 15 
25 (2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 
30 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



i 

3 35 (iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 
40 (A) ORGANISM: Synthetic primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GAGCTCGTTT TCCCAG 16 
5 (2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 
20 (A) ORGANISM: Synthetic 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



25 



GAGCTCCACC GCGGTGTCAC GACTATCTGC GGCCGCATGC CCGGGAATGG CGCCTCGAGA 60 
CGTCTTTATC GATACCGTCG ACCTCGAACT CGAGACGTCC CGGGCGCCTA GGTACC 116 
(2) INFORMATION FOR SEQ ID NO: 21: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



35 



(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



40 (iv) ANTI-SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

GAGCTCGTTT TCCCAGTCAC GACTATCTGC GGCCGCATGC CCGGGAATGG CGCCTCGAGA 60 

CGTTATCGAT TAGCCTGACT GAAGACTCGA GACGTCCCGG GCGCCTAGGT ACC 113 

10 (2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 
25 (A) ORGANISM: Synthetic 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



30 



GAGCTCGTTT TCCCAGTCAC GACTATCTGC GGCCGCATGC CCGGGAATGG CGCCTCGAGA 60 



CGTCTATATC GATTAGCCTG ACTGAAGACT CGAGACGTCC CGGGCTAGGT ACC 113 



(2) INFORMATION FOR SEQ ID NO: 23: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 
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(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

5 (iv) ANTI-SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AACTGGAAGA ATTCGCGGCC GCAGGAATTT TTTTTTTTTT TTTTTV 4 6 

(2) INFORMATION FOR SEQ ID NO: 24: 



15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 ( D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



25 



30 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 



AGGTCGACGG TATCGGNNNN N 21 
35 (2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



4 (iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 
10 (A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



15 



AGGTCGACGG TATCGGNNNN NN 22 



(2) INFORMATION FOR SEQ ID NO:26: 



| (i) SEQUENCE CHARACTERISTICS: 

«: (A) LENGTH: 20 base pairs 

20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



25 



(ii) MOLECULE TYPE: DNA (genomic) 



(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE : NO 

30 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AGTCAGGCTA ATCGGNNNNN 20 
35 (2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

5 (iii) HYPOTHETICAL : NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 
10 (A) ORGANISM: Synthetic primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
GTCAGGCTAA TCGGNNNNNN 20 
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